Sccs 2SLS-IS downloads

From InterSciWiki
Jump to: navigation, search

EduB -


Giorgio Gosti drawVar.R modified, SFI Working group in Causal Analysis. Lines are SCCS culturally closest neighbors (Murdock and White 1969), red 4 is moralizing gods, sccs$238==4
DRW: The substitute for HiGod4 (which is not yet "published" by its author) is sccs$v238
the others are available as
sccs$No_Rain or sccs$water (No_Rain_Dry is a composite of this and another variable )
sccs$Missions
Nonetheless 70% of the students completed their study in 4 class days each 2 hours. Completion in 8 hours is a considerable accomplishment for which instructor Feng Ren is to be congratulated. He notes: btw the new program is much easier to be modified, a lot thanks to Scott.

Contents

For students (Xi'an 8 hour application on network approaches to cross-cultural research, ...), can be applied to any network data with node attributes

Read the two required articles and browse others of interest and dialogue with your instructor(s). Try to understand how the first article connects regression analysis with causal graphs. Discuss with instructor. In reading the article try to understand how the three main concepts in the abstract related to the topics in the main texts.

In reading the second article (Eff and Dow 2009) please recognize that their discussion of imputation for missing values and the use of W matrices to help control for the nonindependence of cases (autocorrelation) applies to the R scripts that we use in the first article, and the distinction between there 2SLS prototype and the three new features of the 2SLS-IS scripts in R. It is only those latter scripts that you will use in the class.

You can demonstrate spatial autocorrelation to yourself without any special preparation. Just install R on your computer and run (start) R. Copy and paste the following script directly into R. It generates a W matrix for normalized 'closeness' of neighbors (see map above).

Wlink=matrix(data = 0, nrow = 186, ncol = 186)
for(i in 1:185) {
Wlink[i,i+1]=1
Wlink[i+1,i]=1}
for(i in 1:184) {
Wlink[i,i+2]=.8
Wlink[i+2,i]=.8}
for(i in 1:183) {
Wlink[i,i+3]=.4
Wlink[i+3,i]=.4}
for(i in 1:182) {
Wlink[i,i+4]=.2
Wlink[i+4,i]=.2}
#Wlink=Wlink^.5 #can adjust elements, e.g., Wlink^2
Wlink=Wlink/rowSums(Wlink)

Next to the zeros on the diagonal the high values are close to the diagonal. Check this for the the early and last elements of the W matrix and check that rows sum to 1:

Wlink[1:3,1:3]
Wlink[184:186,184:186]
sum(Wlink[55,1:186]) #Exercise: try changing 55 to any other number between 1 and 186: do all rows sum to 1?

Read Appendix 1 of White et al. 2011 (first required reading) to understand how the matrix product y %*% W = Wy transforms a dependent variable y (similarly for independent variables x in a set of columns X, so that X %*% W = XW are a set of new transformed variables), and understand how the transformations of these variables represent for each case a weighted average of the values on that variable of the neighbors' values. The first stage of 2SLS regresses the Wy of the neighbors weighted values on the dependent variable on the XW weighted values on the independent variables. The subtraction of that neighborhood effect, y - Wy, ignoring the error term of the regression, becomes the dependent variables in the second stage of 2SLS regression. If the error terms are random at this second stage, then the autocorrelation effect has been removed. Only then do significance tests become valid, and these are the probability values you will see for each variable as the outcome of 2SLS regression. Since you have just programmed a W matrix, Wlink, and you have downloaded a database in which y<-sccs$v3 is coded for all cases, try multiplying ynew = y %*% Wlink and compute max(ynew); max(y); min(ynew); min(ynew). Note that the max and min for ynew are the same as for y. Discuss what this means in terms of first stage and second stage 2SLS regression.

Now let y be a dependent variable for 186 societies with different values

y = NULL
for(i in 1:6) y = c(y,rep(i,31))
#y
W<-Wlink
Wy <- W %*% y
cor(y,Wy)
          [,1]
[1,] 0.9967493 #so you see that neighbors predict our dependent variable. Now add lots of randomness and you'll see the neighbors predict the dependent variable
y=y+runif(186, 0, 3)
Wy <- W %*% y
cor(y,Wy)
         [,1]
[1,] 0.8733619 #see how easy it is to write scripts in R?
round(y,0) #you cant see the pattern of y so clearly now, can you?

3 Now you can come to understand what two_stage_OLS.R (2SLS) means in a regression framework. In the first stage OLS, Wy=\delta_0 + \sum_{j=1}^N\delta_j (X_jW) + \epsilon^' . Using the estimate of Wy, \hat{Wy}=Wy - \epsilon^' , then in the second stage OLS (y-\hat{Wy}))=\beta_0 + \sum_{j=1}^N\beta_j X_j + \epsilon (you too can learn to write Math font on a wiki). Because of OLS the program is extremely fast except for the minute or so of multiple imputation of missing data which is what you see on the screen until the results appear (unless there is an error message: see your instructor).

Now #Set up your computer root and directories, then test whether the models run on your computer, either a Mac or a PC. The scripts run very quickly because OLS is a very simple procedure. Your directories should form a tree (the root will vary), with the files in the appropriate subdirectories. You can run the sample programs from your installation for by copy and paste off this wiki page. One you are modifying a ...create...R file it must be in an appropriate directory on your computer. No other files need to be edited. The result will appear on the screen and in a *.csv file (readable in excel) several levels below the subdirectory of your ...create...R file and will bear the name you have it in the ...create...R file.

Assignment 1: Find, print, and study one of the two main models with files that include source("http:/... R/source/model/create_1..."). If you choose the Brown and Eff (2011) model for Moral Gods then read that article from the optional list. If you choose the White, Feng, Gosti and Oztan 2011 model for Moral Gods with variables such as AnimXwealth and FxCmtyWages, it is discussed in our first required reading.

... more to come.

4 More on #Dropbox downloads and install

5 #Refer to the sccs codebook to choose variables for exploring further predictors in an existing mode or a model with a new dependent variable.

Read the articles

REQUIRED

OPTIONAL

Set up your computer root and directories, then test

  • Choose a root directory and create subdirectories in which to place your files
/sccs (program output is automatically saved here)
/R_3_s_ols (stores the two_stage_ols.R and average.R scripts)
/examples (contains only the create and data subdirectories)
/data (stores vaux.Rdata, dist25wm.dta, and langwm.dta needed for imputation and autocorrelation)
/create (create your own subdirectories here for different models: the only scripts you will edit will be here)
/R_run (stores run_model_100.R and run_model_79.R for main model and multiple inferential statistics runs)
load(url('http://intersci.ss.uci.edu/wiki/R/source/sccsdata.Rdata')) #check if this data source downloads for you
sccsdata$v1         #factorsccsdata with variable lables
sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x)))
sccs$v1               #numericsccsdata
setwd('/Users/drwhite/Documents/3sls/sccs/')
save(sccsdata,file='sccsdata.Rdata')  #save sccs data to YOUR working directory at setwd() getwd()
Available libraries           COPY the next six lines to your own wiki page, followed by one or more model below
setwd('/Users/drwhite/Documents/3sls/sccs/')  #Mac: your root directory should end in .../sccs/
library(mice)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
library(relaimpo) 	# Rp2 relative importance (to be added)
library(gplots)		# optional, for related scripts
library(psychometric) 	# optional, for related scripts 
library(psych)		# optional, for related scripts
library(multilevel) 	# optional, for related scripts
library(maptools) 	# optional, for related scripts 
library(Hmisc)		# functions useful for data analysis, high-level graphics
library(vegan)		# statistical pack for environmental studies
library(forward)	# search approach to robust analysis in linear and generalized
library(AER)		# Applied Econometrics Routines

Full models from EduR_2 Value of Children Sccs_2SLS-IS_downloads#REQUIRED Appendix 3

#THIS WORKS -- Eff and Dwo 2009 Model EduR-1.0 for VALUE OF CHILDREN v473-476 R2~.107
setwd('/Users/drwhite/Documents/3sls/sccs/')  #Mac: your root directory should end in .../sccs/
load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002
sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x)))
source("http://intersci.ss.uci.edu/wiki/R/source/model/CreateModelValueSDWchildrenLang.R")  #2nd source edit only this file for your model 
source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols_full.R")                #3rd source program (2SLS) FULL imputation
            Copy and paste to here, check or errors and correct, then copy and paste the 2nd set 
source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)
source("http://intersci.ss.uci.edu/wiki/R/averageAll.R")     		    #6th source program (save imputed data) 
            Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file 
output=average(ols_stats$imputed_data)					#output but not in data frame
'class(output)'                               			#output now in a 'class' to make data frame
output = as.data.frame(output)    		#Once it's a data frame it should behave just like my_sccs
output$eeextwar; my_sccs$eeextwar					#show that missing data are fully imputed
output$FxCmtyWages; my_sccs$FxCmtyWages					#show that missing data are fully imputed

You can also run Wikipedia:Generalized least squares as a replacement for two_stage.ols.R. GLS is applied when the variances of the observations are unequal (heteroscedasticity), or when there is a certain degree of correlation between the observations. In these cases ordinary least squares can be statistically inefficient, or even give misleading inferences. Similarly, you can run the Wikipedia:Generalized linear model (GLM) and Wikipedia:Generalized_linear_mixed_model (GLMM).

 source("http://intersci.ss.uci.edu/wiki/R/two_stage_gls.R")
 source("http://intersci.ss.uci.edu/wiki/R/two_stage_glm.R")
 source("http://intersci.ss.uci.edu/wiki/R/two_stage_glmm.R")
 source("http://intersci.ss.uci.edu/wiki/R/two_stage_gls_full.R")
 source("http://intersci.ss.uci.edu/wiki/R/two_stage_glm_full.R")
 source("http://intersci.ss.uci.edu/wiki/R/two_stage_glmm_full.R")

Full models from EduR_1.5

EduR-1.5#create_EduR_1.2Fcreate_EduR_1.5DistBrownEffSign.R_PCAP_with_1stPCA_depvar_v238

source("examples/create/create_EduR_1/create_EduR_1.5DistBrownEffSign.R")

Full models from wiki/R

#THIS WORKS -- Brown and Eff Model for MORAL GODS v238 R2~.409
setwd('/Users/drwhite/Documents/3sls/sccs/')  #Mac: your root directory should end in .../sccs/
load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002
sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x)))
source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistBrownEffSign.R")  #2nd source edit only this file for your model 
source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols_full.R")                #3rd source program (2SLS) FULL imputation
            Copy and paste to here, check or errors and correct, then copy and paste the 2nd set 
source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)
source("http://intersci.ss.uci.edu/wiki/R/averageAll.R")     		    #6th source program (save imputed data) 
            Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file 
output=average(ols_stats$imputed_data)					#output but not in data frame
'class(output)'                               			#output now in a 'class' to make data frame
output = as.data.frame(output)    		#Once it's a data frame it should behave just like my_sccs
output$eeextwar; my_sccs$eeextwar					#show that missing data are fully imputed
#THIS WORKS -- Brown and Eff Model for MORAL GODS v238 R2~.409
setwd('/Users/drwhite/Documents/3sls/sccs/')  #Mac: your root directory should end in .../sccs/
load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002
sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x)))
source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistBrownEffSign.R")  #2nd source edit only this file for your model 
source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R")                #3rd source program (2SLS) FULL imputation
            Copy and paste to here, check or errors and correct, then copy and paste the 2nd set 
source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)
source("http://intersci.ss.uci.edu/wiki/R/averageAll.R")     		    #6th source program (save imputed data) 
            Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file 
output=average(ols_stats$imputed_data)					#output but not in data frame
'class(output)'                               			#output now in a class to make data frame
output = as.data.frame(output)    		#Once it's a data frame it should behave just like my_sccs
output$eeextwar; my_sccs$eeextwar					#show that missing data are fully imputed
#THIS WORKS AND CAN BE USED White et al Model for MORAL GODS v238 overfitted R2=.469 with 7 variables + distance
load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002
sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x)))
source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistB_EffHiddenVariablesNew.R") #2nd source program  
source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R")                #3rd source program (2SLS) 
source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run) R2=.454 with 4 variables + distance, no Snarey variables
source("http://intersci.ss.uci.edu/wiki/R/source/averageAll.R")
output=average(ols_stats$imputed_data)					#output but not in data frame
'class(output)'                               			#output now in a class to make data frame
output = as.data.frame(output)    		#Once it's a data frame it should behave just like my_sccs
output$eextwar; my_sccs$eextwar				#show that missing data are fully imputed

These dont work

##source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistB_EffHiddenVariablesNew.R") #2nd source program  
source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R")  #3rd source program (2SLS) 
source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)

Possible error messages: errors and warnings

Doug_White_Modeling_Strategy

  • This error is caused by not having the Snarey variable $Rain, will occur for other Snarey variables unless using the right load(...

Error in data.frame(dep_var = sccs$HiGod4, evileye = sccs$v1189, no_rain = sccs$Rain,  : arguments imply differing number of rows: 0, 186

  • Error in `[.data.frame`(y, train_idxs) : undefined columns selected -- NO dep_var=sccs$v999 defined
  • Error in as.matrix(dataset_i[, indep_vars]): error in evaluating the argument 'x' in selecting a method for function 'as.matrix' THERE WAS AN UNDEFINED VARIABLE, "superjh", which I had confused with "SuperjhWriting", the new depvar. Then I had to remove PCsize and PCsize2 which were composed of a PCA of "superjh" and another variable.
  • Error in linearHypothesis.lm(lm.unrestricted, drop_vars, test = "Chisq") : there are aliased coefficients in the model. (2 variables have same sccs$vnumber
This error is in indep_vars=c(.... )
  • When you cant get an error our of indep_vars (same as lm.unrestricted) copy the variables from restrict_vars into the first line, remove those elsewhere, allow one other variable, comment out the rest and replace as needed for later models. Always keep one extra variable in indep_vars
  • Error in linearHypothesis.lm(lm.restricted, ]
This error is in restrict_vars=c(.... )
  • Warning message: package 'foreign' was built under R version 2.13.2: reinstall library(foreign)
  • Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection: your requested file name is not found
  • In cbind(combined, round(tot_cor, sig_digits)) :
 number of rows of result is not a multiple of vector length (arg 2) ???
  • FURTHER INSTRUCTIONS FIXING ERRORS
  • When you put in a new dep_var in my_sccs, take out its name in my_sccs, indep_vars=c(.... ), restrict_vars=c(.... )
  • Take care when you remove a variable label in my_sccs, indep_vars=c(.... ), restrict_vars=c(.... ), no comments inside parens
  • When editing your create file and getting errors:
repeat create only (see if errors), correct, run again along
when working then repeat the OLS and the RUN.R
repeat in order above if error
  • WARNING: THIS ONLY BECAUSE library(sna) was called
Error in write.table(title, file = summary_results_file, append = F, col.names = F,  :
invalid 'row.names' specification

Saving results on the wiki

  • Your wiki page should be opened by edit to a SECTION that contains a program. Copy program. In R, Paste.
  • Use <wiki>==Heading==</nowiki>
to create heading

Dropbox downloads and install - backup access to files and scripts

  • for the zip files:- RightClick to download, dbl click zip file, dist25wm.dta appears in that directory
zip files made by mac FINDER - FILE - Compress

Refer to the sccs codebook to choose variables

Personal tools