# Sccs 2SLS-IS downloads

EduB -

- Sccs 2SLS-IS downloads for PC - Xi'an students - Xi'an invitation - SCCS R package - Z-0004Protected backup - 2SLS-IS - instructional page intended to work for classes taught across continents. Doug White Modeling Strategy - Draft on Moral gods - SFI2011 project
- a problem in current work is there are no No_Rain_Dry, Missions and HiGod4 variables in the "sccsdata" which we are working on

- DRW: The substitute for HiGod4 (which is not yet "published" by its author) is sccs$v238
- the others are available as
- sccs$No_Rain or sccs$water (No_Rain_Dry is a composite of this and another variable )
- sccs$Missions

- Nonetheless 70% of the students completed their study in 4 class days each 2 hours. Completion in 8 hours is a considerable accomplishment for which instructor Feng Ren is to be congratulated. He notes: btw the new program is much easier to be modified, a lot thanks to Scott.

- The next step in this project is undertaken here: To make this SCCS and network research practicum available to any seminar up to 20 students from a lab or personal computers with a link to this wiki.

## For students (Xi'an 8 hour application on network approaches to cross-cultural research, ...), can be applied to any network data with node attributes

Read the two required articles and browse others of interest and dialogue with your instructor(s). Try to understand how the first article connects regression analysis with causal graphs. Discuss with instructor. In reading the article try to understand how the three main concepts in the abstract related to the topics in the main texts.

In reading the second article (Eff and Dow 2009) please recognize that their discussion of imputation for missing values and the use of W matrices to help control for the nonindependence of cases (autocorrelation) applies to the R scripts that we use in the first article, and the distinction between there 2SLS prototype and the three new features of the 2SLS-IS scripts in R. It is only those latter scripts that you will use in the class.

You can demonstrate spatial autocorrelation to yourself without any special preparation. Just install R on your computer and run (start) R. Copy and paste the following script directly into R. It generates a W matrix for normalized 'closeness' of neighbors (see map above).

Wlink=matrix(data = 0, nrow = 186, ncol = 186) for(i in 1:185) { Wlink[i,i+1]=1 Wlink[i+1,i]=1} for(i in 1:184) { Wlink[i,i+2]=.8 Wlink[i+2,i]=.8} for(i in 1:183) { Wlink[i,i+3]=.4 Wlink[i+3,i]=.4} for(i in 1:182) { Wlink[i,i+4]=.2 Wlink[i+4,i]=.2} #Wlink=Wlink^.5 #can adjust elements, e.g., Wlink^2 Wlink=Wlink/rowSums(Wlink)

Next to the zeros on the diagonal the high values are close to the diagonal. Check this for the the early and last elements of the W matrix and check that rows sum to 1:

Wlink[1:3,1:3] Wlink[184:186,184:186] sum(Wlink[55,1:186]) #Exercise: try changing 55 to any other number between 1 and 186: do all rows sum to 1?

Read Appendix 1 of White et al. 2011 (first required reading) to understand how the matrix product y %*% W = Wy transforms a dependent variable y (similarly for independent variables x in a set of columns X, so that X %*% W = XW are a set of new transformed variables), and understand how the transformations of these variables represent for each case a weighted average of the values on that variable of the neighbors' values. The first stage of 2SLS regresses the Wy of the neighbors weighted values on the dependent variable on the XW weighted values on the independent variables. The subtraction of that neighborhood effect, y - Wy, ignoring the error term of the regression, becomes the dependent variables in the second stage of 2SLS regression. If the error terms are random at this second stage, then the autocorrelation effect has been removed. Only then do significance tests become valid, and these are the probability values you will see for each variable as the outcome of 2SLS regression. Since you have just programmed a W matrix, Wlink, and you have downloaded a database in which y<-sccs$v3 is coded for all cases, try multiplying ynew = y %*% Wlink and compute max(ynew); max(y); min(ynew); min(ynew). Note that the max and min for ynew are the same as for y. Discuss what this means in terms of first stage and second stage 2SLS regression.

Now let y be a dependent variable for 186 societies with different values

y = NULL for(i in 1:6) y = c(y,rep(i,31)) #y W<-Wlink Wy <- W %*% y cor(y,Wy) [,1] [1,] 0.9967493 #so you see that neighbors predict our dependent variable. Now add lots of randomness and you'll see the neighbors predict the dependent variable y=y+runif(186, 0, 3) Wy <- W %*% y cor(y,Wy) [,1] [1,] 0.8733619 #see how easy it is to write scripts in R? round(y,0) #you cant see the pattern of y so clearly now, can you?

3 Now you can come to understand what two_stage_OLS.R (2SLS) means in a regression framework. In the first stage OLS, . Using the estimate of Wy, , then in the second stage OLS (you too can learn to write **Math font** on a wiki). Because of OLS the program is extremely fast except for the minute or so of multiple imputation of missing data which is what you see on the screen until the results appear (unless there is an error message: see your instructor).

Now #Set up your computer root and directories, then test whether the models run on your computer, either a Mac or a PC. The scripts run very quickly because OLS is a very simple procedure.
Your directories should form a tree (the root will vary), with the files in the appropriate subdirectories. You can run the sample programs from your installation for by copy and paste off this wiki page. One you are modifying a **...create...R** file it must be in an appropriate directory on your computer. No other files need to be edited. The result will appear on the screen and in a *.csv file (readable in excel) several levels below the subdirectory of your **...create...R** file and will bear the name you have it in the **...create...R** file.

**Assignment 1**: Find, print, and study one of the two main models with files that include source("http:/... R/source/model/create_1..."). If you choose the Brown and Eff (2011) model for Moral Gods then read that article from the optional list. If you choose the White, Feng, Gosti and Oztan 2011 model for Moral Gods with variables such as AnimXwealth and FxCmtyWages, it is discussed in our first required reading.

... more to come.

4 More on #Dropbox downloads and install

5 #Refer to the sccs codebook to choose variables for exploring further predictors in an existing mode or a model with a new dependent variable.

## Read the articles

#### REQUIRED

- Two-Stage Least Squares and Inferential Statistics (2SLS-IS) for Fast and Robust OLS in R by White, Feng, Gosti and Oztan. Appendix 3 shows Eff and Dow (2009) results. Updates available at DRWhite home page, click last graphic with this title.
- Eff, E. Anthon, and Malcolm Dow.
**2009 pdf quick download**. How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research : A Primer for R. Structure and Dynamics: eJournal of Anthropological and Related Sciences 3#3 art 1. Eff and Dow 2009 - the code.

#### OPTIONAL

- Christian Brown and Anthon Eff,
**2010 pdf**. The State and the Supernatural: Support for Prosocial Behavior, Structure and Dynamics: eJournal of Anthropological and Related Sciences, 4(1) art 1. Brown and Eff 2010 code. The article comments on Snarey, John R. 1996. The natural environment's impact upon religious ethics: a cross-cultural study. Journal for the Scientific Study of Religion 35(2):85-96. - *Rccs* is an earlier modification of the Eff, E. Anthon, and Malcolm Dow (2009) that is not recommended and superceded by 2SLS-IS.
- Dow, Malcolm M. 2007. Galton's Problem as Multiple Network Autocorrelation Effects: Cultural Trait Transmission and Ecological Constraint. Cross-Cultural Research 41(4):336-363. The Cultural Trait Transmission variables here correspond to vertical (language family proximity) and horizontal (special proximity) in the Standard Cross-Cultural Sample.
- Dow, Malcolm M. 2008. Network Autocorrelation Regression With Binary and Ordinal Dependent Variables Cross-Cultural Research 42(4):394-419.
- Dow, Malcolm M., and E. Anthon Eff. 2009a. Cultural Trait Transmission and Missing Data as Sources of Bias in Cross-Cultural Survey Research: Explanations of Polygyny Re-examined. Cross-Cultural Research. 43(2): 134-151. The Cultural Trait Transmission variables here are developed in Dow (2007) and correspond to vertical (language family proximity) and horizontal (special proximity) in the Standard Cross-Cultural Sample.
- Dow, Malcolm M. 2008a.
**Global, Regional, and Local Network Autocorrelation in the Standard Cross-Cultural Sample**Cross-Cultural Research 42(2):148-171.**pp 158-159 show variables in the SCCS with high spatial autocorrelation.** - Dow, Malcolm M. 2007. Galton's Problem as Multiple Network Autocorrelation Effects: Cultural Trait Transmission and Ecological Constraint. Cross-Cultural Research 41(4):336-363. The Cultural Trait Transmission variables here correspond to vertical (language family proximity) and horizontal (special proximity) in the Standard Cross-Cultural Sample.
- Dow, Malcolm M., and E. Anthon Eff. 2009 Multiple Imputation of Missing Data in Cross-Cultural Samples. Cross-Cultural Research, Vol. 43, No. 3, 206-229 (2009) http://ccr.sagepub.com/cgi/content/abstract/43/3/206
- Eff, E. Anthon. 2008. "Weight Matrices for Cultural Proximity: Deriving Weights from a Language Phylogeny." Structure and Dynamics: eJournal of Anthropological and Related Sciences 3(2), Article 9. http://repositories.cdlib.org/imbs/socdyn/sdeas/vol3/iss2/art9
- Malcolm M. Dow, Michael L. Burton, and Douglas R. White. 1982. Network Autocorrelation: A Simulation Study of a Foundational Problem in the Social Sciences. Social Networks 4(2):169-200. See - Galton's problem and autocorrelation
- Malcolm M. Dow, Douglas R. White, M. L. Burton. 1982 Multivariate Modeling with Interdependent Network Data. Cross-Cultural Research 17(3-4):216-245.

## Set up your computer root and directories, then test

- Choose a root directory and create subdirectories in which to place your files

- /sccs (program output is automatically saved here)
- /R_3_s_ols (stores the two_stage_ols.R and average.R scripts)
- /examples (contains only the create and data subdirectories)
- /data (stores vaux.Rdata, dist25wm.dta, and langwm.dta needed for imputation and autocorrelation)
- /create (create your own subdirectories here for different models: the only scripts you will edit will be here)
- /R_run (stores run_model_100.R and run_model_79.R for main model and multiple inferential statistics runs)

load(url('http://intersci.ss.uci.edu/wiki/R/source/sccsdata.Rdata')) #check if this data source downloads for you sccsdata$v1 #factorsccsdata with variable lables sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x))) sccs$v1 #numericsccsdata setwd('/Users/drwhite/Documents/3sls/sccs/') save(sccsdata,file='sccsdata.Rdata') #save sccs data to YOUR working directory at setwd() getwd()

- http://dl.dropbox.com/u/37813961/sccs2.zip #this zip doesnt decompress

Available libraries COPY the next six lines to your own wiki page, followed by one or more model below setwd('/Users/drwhite/Documents/3sls/sccs/') #Mac: your root directory should end in .../sccs/ library(mice) library(spdep) library(car) library(lmtest) library(sandwich) library(relaimpo) # Rp2 relative importance (to be added) library(gplots) # optional, for related scripts library(psychometric) # optional, for related scripts library(psych) # optional, for related scripts library(multilevel) # optional, for related scripts library(maptools) # optional, for related scripts library(Hmisc) # functions useful for data analysis, high-level graphics library(vegan) # statistical pack for environmental studies library(forward) # search approach to robust analysis in linear and generalized library(AER) # Applied Econometrics Routines

### Full models from EduR_2 Value of Children Sccs_2SLS-IS_downloads#REQUIRED Appendix 3

#THIS WORKS -- Eff and Dwo 2009 Model EduR-1.0 for VALUE OF CHILDREN v473-476 R2~.107 setwd('/Users/drwhite/Documents/3sls/sccs/') #Mac: your root directory should end in .../sccs/ load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002 sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x))) source("http://intersci.ss.uci.edu/wiki/R/source/model/CreateModelValueSDWchildrenLang.R") #2nd source edit only this file for your model source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols_full.R") #3rd source program (2SLS) FULL imputation Copy and paste to here, check or errors and correct, then copy and paste the 2nd set source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run) source("http://intersci.ss.uci.edu/wiki/R/averageAll.R") #6th source program (save imputed data) Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file output=average(ols_stats$imputed_data) #output but not in data frame 'class(output)' #output now in a 'class' to make data frame output = as.data.frame(output) #Once it's a data frame it should behave just like my_sccs output$eeextwar; my_sccs$eeextwar #show that missing data are fully imputed output$FxCmtyWages; my_sccs$FxCmtyWages #show that missing data are fully imputed

You can also run Wikipedia:Generalized least squares as a replacement for two_stage.ols.R. GLS is applied when the variances of the observations are unequal (heteroscedasticity), or when there is a certain degree of correlation between the observations. In these cases ordinary least squares can be statistically inefficient, or even give misleading inferences. Similarly, you can run the Wikipedia:Generalized linear model (GLM) and Wikipedia:Generalized_linear_mixed_model (GLMM).

source("http://intersci.ss.uci.edu/wiki/R/two_stage_gls.R")

source("http://intersci.ss.uci.edu/wiki/R/two_stage_glm.R")

source("http://intersci.ss.uci.edu/wiki/R/two_stage_glmm.R")

source("http://intersci.ss.uci.edu/wiki/R/two_stage_gls_full.R")

source("http://intersci.ss.uci.edu/wiki/R/two_stage_glm_full.R")

source("http://intersci.ss.uci.edu/wiki/R/two_stage_glmm_full.R")

### Full models from EduR_1.5

EduR-1.5#create_EduR_1.2Fcreate_EduR_1.5DistBrownEffSign.R_PCAP_with_1stPCA_depvar_v238

source("examples/create/create_EduR_1/create_EduR_1.5DistBrownEffSign.R")

### Full models from wiki/R

#THIS WORKS -- Brown and Eff Model for MORAL GODS v238 R2~.409 setwd('/Users/drwhite/Documents/3sls/sccs/') #Mac: your root directory should end in .../sccs/ load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002 sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x))) source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistBrownEffSign.R") #2nd source edit only this file for your model source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols_full.R") #3rd source program (2SLS) FULL imputation Copy and paste to here, check or errors and correct, then copy and paste the 2nd set source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run) source("http://intersci.ss.uci.edu/wiki/R/averageAll.R") #6th source program (save imputed data) Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file output=average(ols_stats$imputed_data) #output but not in data frame 'class(output)' #output now in a 'class' to make data frame output = as.data.frame(output) #Once it's a data frame it should behave just like my_sccs output$eeextwar; my_sccs$eeextwar #show that missing data are fully imputed

#THIS WORKS -- Brown and Eff Model for MORAL GODS v238 R2~.409 setwd('/Users/drwhite/Documents/3sls/sccs/') #Mac: your root directory should end in .../sccs/ load('sccsdata.Rdata') #Highest numeric variables are sccs$v2002 sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x))) source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistBrownEffSign.R") #2nd source edit only this file for your model source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R") #3rd source program (2SLS) FULL imputation Copy and paste to here, check or errors and correct, then copy and paste the 2nd set source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run) source("http://intersci.ss.uci.edu/wiki/R/averageAll.R") #6th source program (save imputed data) Copy and paste to here, check or errors and correct, then copy your results to your page on the wiki, and rename your *.csv output file output=average(ols_stats$imputed_data) #output but not in data frame 'class(output)' #output now in a class to make data frame output = as.data.frame(output) #Once it's a data frame it should behave just like my_sccs output$eeextwar; my_sccs$eeextwar #show that missing data are fully imputed

#THIS WORKS AND CAN BE USED White et al Model for MORAL GODS v238overfitted R2=.469 with 7 variables + distanceload('sccsdata.Rdata') #Highest numeric variables are sccs$v2002 sccs<-data.frame(sapply(sccsdata,function(x) as.numeric(x))) source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistB_EffHiddenVariablesNew.R") #2nd source program source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R") #3rd source program (2SLS) source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)R2=.454 with 4 variables + distance, no Snarey variablessource("http://intersci.ss.uci.edu/wiki/R/source/averageAll.R") output=average(ols_stats$imputed_data) #output but not in data frame 'class(output)' #output now in a class to make data frame output = as.data.frame(output) #Once it's a data frame it should behave just like my_sccs output$eextwar; my_sccs$eextwar #show that missing data are fully imputed

##source("http://intersci.ss.uci.edu/wiki/R/source/model/create_EduR_1.5DistB_EffHiddenVariablesNew.R") #2nd source program source("http://intersci.ss.uci.edu/wiki/R/two_stage_ols.R") #3rd source program (2SLS) source("http://intersci.ss.uci.edu/wiki/R/source/run_model_100.R") #4th source program (run)

## Possible error messages: errors and warnings

- This error is caused by not having the Snarey variable $Rain, will occur for other Snarey variables unless using the right load(...

Error in data.frame(dep_var = sccs$HiGod4, evileye = sccs$v1189, no_rain = sccs$Rain, : arguments imply differing number of rows: 0, 186

- Error in `[.data.frame`(y, train_idxs) : undefined columns selected -- NO dep_var=sccs$v999 defined
- Error in as.matrix(dataset_i[, indep_vars]): error in evaluating the argument 'x' in selecting a method for function 'as.matrix' THERE WAS AN UNDEFINED VARIABLE, "superjh", which I had confused with "SuperjhWriting", the new depvar. Then I had to remove PCsize and PCsize2 which were composed of a PCA of "superjh" and another variable.
- Error in linearHypothesis.lm(lm.unrestricted, drop_vars, test = "Chisq") : there are aliased coefficients in the model. (2 variables have same sccs$vnumber

- This error is in indep_vars=c(.... )

- When you cant get an error our of indep_vars (same as lm.unrestricted) copy the variables from restrict_vars into the first line, remove those elsewhere, allow one other variable, comment out the rest and replace as needed for later models. Always keep one extra variable in indep_vars
- Error in linearHypothesis.lm(lm.restricted, ]

This error is in restrict_vars=c(.... )

- Warning message: package 'foreign' was built under R version 2.13.2: reinstall library(foreign)
- Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection: your requested file name is not found
- In cbind(combined, round(tot_cor, sig_digits)) :

number of rows of result is not a multiple of vector length (arg 2) ???

- FURTHER INSTRUCTIONS FIXING ERRORS
- When you put in a new dep_var in my_sccs, take out its name in my_sccs, indep_vars=c(.... ), restrict_vars=c(.... )
- Take care when you remove a variable label in my_sccs, indep_vars=c(.... ), restrict_vars=c(.... ), no comments inside parens
- When editing your create file and getting errors:

- repeat create only (see if errors), correct, run again along
- when working then repeat the OLS and the RUN.R
- repeat in order above if error

- WARNING: THIS ONLY BECAUSE library(sna) was called

- Error in write.table(title, file = summary_results_file, append = F, col.names = F, :
- invalid 'row.names' specification

## Saving results on the wiki

- Your wiki page should be opened by edit to a SECTION that contains a program. Copy program. In R, Paste.
- Use <wiki>==Heading==</nowiki>

to create heading

## Dropbox downloads and install - backup access to files and scripts

- for the zip files:- RightClick to download, dbl click zip file, dist25wm.dta appears in that directory

- zip files made by mac FINDER - FILE - Compress

- http://dl.dropbox.com/u/37813961/sccsA.Rdata see User:Jingkun Niu the W matrix for religion
- http://dl.dropbox.com/u/37813961/dist25wm.dta.zip (put in root) sccs/examples/data
- http://dl.dropbox.com/u/37813961/langwm.dta.zip (put in root) sccs/examples/data
- http://dl.dropbox.com/u/37813961/vaux.Rdata #DRWHITE (root) sccs/examples/data --------- auto download
- http://dl.dropbox.com/u/9256203/sccsfac.Rdata #Anthon Eff R data file with named category labels for each variable ---- auto download
- http://dl.dropbox.com/u/37813961/sccsfac.Rdata #DRWHITE (root) sccs/examples/data --------- copy of file above
- https://dl.dropbox.com/u/37813961/create_EduR_1.5DistB_EffHiddenVariables.R (root) sccs/examples/create/ --- #2nd script
- https://dl.dropbox.com/u/37813961/create_EduR_1.5DistB_EffHiddenVariables2.R (root) sccs/examples/create/ --- #2nd script
- https://dl.dropbox.com/u/37813961/create_EduR_1.5DistB_EffHiddenVariablesNew.R (root) sccs/examples/create/ --- #2nd script
- https://dl.dropbox.com/u/37813961/create_EduR_1.5DistBrownEffSign.R (root) sccs/examples/create/ --- #2nd script
- https://dl.dropbox.com/u/37813961/create_EduR_1.8DistBrownEff100HiGod4OLSWageSuperjh_Islam.R
- https://dl.dropbox.com/u/37813961/create_model_value_childrenLangOnly.R (root) sccs/examples/create/ --- #2nd script
- https://dl.dropbox.com/u/37813961/two_stage_ols_full.R #DRWHITE (root) sccs/R_3_s_ols/ -------------- #3rd script
- https://dl.dropbox.com/u/37813961/two_stage_ols.R #DRWHITE (root) sccs/R_3_s_ols/ -------------- #3rd script
- https://dl.dropbox.com/u/37813961/run_model_100.R #DRWHITE (root) sccs/examples/create/R_run --- #4th script
- https://dl.dropbox.com/u/37813961/run_model_79.R #DRWHITE (root) sccs/examples/create/R_run --- #4th script
- https://dl.dropbox.com/u/37813961/average.R #DRWHITE (root) sccs/R_3_s_ols/ --------------- #5th script
- https://dl.dropbox.com/u/37813961/averageAll.R #DRWHITE (root) sccs/R_3_s_ols/ --------------- #5th script

## Refer to the sccs codebook to choose variables

- Review of theories in Cross-Cultural Research
- http://dl.dropbox.com/u/9256203/ccc.txt #Anthon Eff concise codebook for the R data file sccs.Rdata
- [Sources for codes and articles on the SCCS]
- http://eclectic.ss.uci.edu/~drwhite/courses/SCCCodes.htm #DRWhite verbose codebook for the R data file sccsdata.Rdata (and origional) SPSS data file