# EduMod Mac-9: (your dep var here) Imputation and Regression

this EduMod-3 is by Douglas R. White and is protected from changes. RUN THE MAIN PROGRAM AND THEN RUN THE WORKSHOP PROGRAM. Copy this entire file to your EduMod-X page as instructed. Doug's sequential link: EduMod-6: (your dep var here) Imputation and Regression to next EduMod Contents [hide] 1 Your workshop 2 Workshop program 3 Results 3.1 From workshop 4 Programs: Copy and paste new xR<- Amanda's finished project 5 Results 6 A| this works! GOOD START HERE FOR OTHER PROJECTS Expanded xR to full xUR- ALL indvars w/ depvar="rape" JUST CHANGE YOUR DEPVAR 7 Results for rape667 8 A| delete High VIFs for 667 rape==this works! GOOD START HERE FOR OTHER PROJECTS Expanded xR to full xUR- ALL indvars w/ depvar="rape" JUST CHANGE YOUR DEPVAR 9 Results Your workshop

Back to Imputing_data_for_Regression_Analysis#EduMod Doug's sequential link: EduMod-6: (your dep var here) Imputation and Regression to next EduMod Your main work will be done here after you get your first results with the (4 Programs: Copy and paste) program and paste them below. The demonstration program (Eff and Dow') shows only the final step not the intermediate ones. Here I post the last part of the program. Here you start again to re-estimate the model, this time substituting in xR= (final variables so that xR=xUR (all independent variables). That is done by copying the list of variables that you see after

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of unrestricted model--

(but dont copy the xUR<-lm itself) down to replace the list of xR<-lm variables (the function lm(depvar~list of indep vars) is what does the multiple linear regression, lm="linear model". You will see I have ###commented out the original xR<-lm. Note that for your project rather than this demo you will want also to change your depvar and add to the list of independent variables, working from the codebook, using the variables numbers, like v860, and giving them names in program 1. You probably dont have to change the excluded variables under

1. --collect some model diagnostics--

(NOTE: all program statements in the wiki must have a space in the first column) Now, in a real project, the results you will get by setting xR==xUR need to culled into two sets: those with p<.05 and those to exclude. The analysis of excluded variables when properly specified (the two sets should really be mutually exclusive) will tell you whether you have omitted anything that should be excluded. Read EFF and DOW carefully for this step. IMPORTANT: put both the FULL SET of results for all your xUR variables, which you will obtain here, and the CULLED SET of results for your xR variables, into the RESULTS section of this page so you and I can check whether you have done the analysis correctly. Workshop program

Program 2: Modified for all xR=xUR except dateobs - should take about 40 seconds

1. MI--estimate model with network-lagged dependent variables, combine results

rm(list=ls(all=TRUE))

1. --Set path to your directory with data and program--

setwd("c:/My Documents/MI") options(echo=TRUE)

1. --need these packages for estimation and diagnostics--

library(foreign) library(spdep) library(car) library(lmtest) library(sandwich)

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in original SCCS data---

1. --Read in two weight matrices--

1. --Read in the imputed dataset---

1. HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
2. --create dep.varb. you wish to use from SCCS data--
3. --Here we sum variables measuring how much a society values children--
4. --can replace "sum" with "max"

depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum)

1. --find obs. for which dep. varb. is non-missing--

zdv<-which(!is.na(depvar)) depvar<-depvar[zdv]

1. HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED

depvarname<-"child value"

1. --can add additional SCCS variable, but only if it has no missing values---

dateobs<-SCCS\$v838 dateobs<-dateobs[zdv]

1. --look at frequencies and quartiles for the dep. varb.--

summary(depvar) table(depvar)

1. --modify weight matrices---
2. --set diagonal equal to zeros--

diag(ll)<-0 diag(dd)<-0

1. --use only obs. where dep. varb. non-missing--

ll<-ll[zdv,zdv] dd<-dd[zdv,zdv]

1. --row standardize (rows sum to one)

ll<-ll/rowSums(ll) dd<-dd/rowSums(dd)

1. --make weight matrix object for later autocorrelation test--

wmatll<-mat2listw(as.matrix(ll)) wmatdd<-mat2listw(as.matrix(dd))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT

indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods", "fempower","sexratio","war","himilexp","wagelabor","famsize","settype", "localjh","money","cultints","roots","cereals","gath","hunt","fish", "anim","pigs","milk","plow","bovines","tree","foodtrade", "ndrymonth","ecorich","popdens","pathstress","CVrain","rain", "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg")

1. -----------------------------------------------------
2. ---Estimate model on each imputed dataset------------
3. -----------------------------------------------------
1. --number of imputed datasets--

nimp<-10

1. --will append values to these empty objects--

vif<-NULL ss<-NULL beta<-NULL dng<-NULL

1. --loop through the imputed datasets--

for (i in 1:nimp){

1. --select the ith imputed dataset--

m9<-impdat[which(impdat\$.imp==i),]

1. --retain only obs. for which dep. varb. is nonmissing--

m9<-m9[zdv,]

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --create spatially lagged dep. varbs. in stage 1 OLS--

y<-as.matrix(depvar) xx<-as.matrix(m9[,indpv])

1. --for instruments we use the spatial lag of our indep. varbs.--
2. --First, the spatially lagged varb. for distance--

xdy<-dd%*%xx cyd<-dd%*%y o<-lm(cyd~xdy)

1. --the fitted value is our instrumental variable--

fydd<-fitted(o)

1. --keep R2 from this regression--

dr2<-summary(o)\$r.squared

1. --Then, the spatially lagged varb. for language--

xly<-ll%*%xx cyl<-ll%*%y o<-lm(cyl~xly)

1. --the fitted value is our instrumental variable--

fyll<-fitted(o)

1. --keep R2 from this regression--

lr2<-summary(o)\$r.squared m9<-cbind(m9,fydd,fyll)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of unrestricted model--

xUR<-lm(depvar~fyll+fydd+dateobs+ cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg ,data=m9)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of restricted model--
1. xR<-lm(depvar ~ fyll + cultints + roots + fish +
2. exogamy + settype + femsubs, data = m9)
3. THIS IS JUST PROGRAM 2 EXCEPT DOUG WHITE REDEFINED xR=same as xUR but deleted "dateobs," in line 2
```xR<-lm(depvar~fyll+fydd+
cultints+roots+cereals+gath+plow+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
```

,data=m9)

1. --corrected sigma2 and R2 for 2SLS--

qxx<-m9 qxx[,"fydd"]<-cyd qxx[,"fyll"]<-cyl b<-coef(xR) incpt<-matrix(1,NROW(qxx),1) x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]])) e<-y-x%*%as.matrix(b) cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x))) cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

1. --collect coefficients and their variances--

ov<-summary(xR) vif<-rbind(vif,vif(xR)) ss<-rbind(ss,diag(ov\$cov*cs2))

1. --collect robust coef. variances when there is heteroskedasticity--
2. eb<-e^2
3. x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
4. hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
5. ss<-rbind(ss,diag(hcm))

beta<-rbind(beta,coef(xR))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --collect some model diagnostics--

dropt<-c("cereals","gath","plow","hunt","anim","dateobs", "pigs","milk","bovines","foodscarc","ecorich", "popdens","pathstress","ncmallow","famsize","localjh", "superjh","moralgods","fempower","sexratio","money", "fydd","wagelabor","war","himilexp","tree","foodtrade")

1. --Ramsey RESET test--

p1<-qchisq(resettest(xR,type="fitted")\$"p.value",1,lower.tail=FALSE)

1. --Wald test (H0: dropped variables have coefficient equal zero)--

o<-linear.hypothesis(xUR,dropt,test="Chisq")\$"Pr(>Chisq)"[2] p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue

1. --Heteroskedasticity test (H0: homoskedastic residuals)--

p3<-ncv.test(xR)\$ChiSquare

1. --Shapiro-Wilke normality test (H0: residuals normal)

p4<-qchisq(shapiro.test(e)\$p.value,1,lower.tail=FALSE)

1. --LaGrange Multiplier test for spatial autocorrelation: language--

o<-lm.LMtests(xR, wmatll, test=c("LMlag")) p5<-as.numeric(o\$LMlag\$statistic)

1. --LaGrange Multiplier test for spatial autocorrelation: distance--

o<-lm.LMtests(xR, wmatdd, test=c("LMlag")) p6<-as.numeric(o\$LMlag\$statistic)

1. --model R2--

p7<-cr2 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

1. --------------------------------------------
2. --Rubin's formulas for combining estimates--
3. --------------------------------------------
1. --first find final regr. coefs. and p-values--

mnb<-apply(beta,2,mean) vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1) mnv<-apply(ss,2,mean) vrT<-mnv+vrb*(1-nimp^(-1)) fst<-mnb^2/vrT r<-(1+nimp^(-1))*vrb/mnv v<-(nimp-1)*(1+r^(-1))^2 pval<-pf(fst,1,v,lower.tail=FALSE) bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) bbb\$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3) names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

1. --Then combine the diagnostics we collected--

dng<-data.frame(dng) names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd", "R2:final model","R2:IV(distance)","R2:IV(language)") r2<-apply(dng[,7:9],2,mean) adng<-dng[,1:6] mdm<-apply(adng,2,mean) vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1) aa<-4*mdm^2-2*vrd aa[which(aa<0)]<-0 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5) vd<-(nimp-1)*(1+rd^(-1))^2 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)

1. -All chi-sq we collected have df=1-------

pvald<-pf(Dm,1,vd,lower.tail=FALSE) ccc<-data.frame(round(cbind(Dm,vd,pvald),3)) names(ccc)<-c("Fstat","df","pvalue")

bbb r2 ccc

1. --write results to csv file for perusal in spreadsheet--

write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE) write.csv(bbb,file="OLSresults.csv",append=TRUE) write.csv(r2,file="OLSresults.csv",append=TRUE) write.csv(ccc,file="OLSresults.csv",append=TRUE) Results

From workshop all xR=xUR except dateobs. Your next step if this were a new project would be to keep xR=only those with p<.05 bbb

```             coef Fstat       ddf pvalue   VIF
```

(Intercept) -3.778 0.048 3518.772 0.827 NA fyll 2.092 5.493 72408.940 0.019 4.311 fydd -0.701 1.525 27950.568 0.217 3.570 cultints 1.090 3.768 6088.808 0.052 5.222 roots -4.595 3.656 3015.253 0.056 5.091 cereals -1.476 0.386 1699.391 0.534 7.508 gath -0.492 0.925 3771.955 0.336 3.224 plow -2.235 1.249 5935.504 0.264 3.178 hunt -0.162 0.083 7873.798 0.773 5.378 fish 0.273 0.432 1534.074 0.511 3.200 anim -0.138 0.079 22553.987 0.779 5.968 pigs 0.560 0.092 3241.095 0.762 2.233 milk -1.520 0.681 5301.829 0.409 3.970 bovines 1.870 1.011 2777.386 0.315 4.355 tree -5.090 2.826 9236.350 0.093 3.306 foodtrade 0.091 3.101 6431.573 0.078 1.613 foodscarc -0.385 0.947 490.565 0.331 1.306 ecorich -0.198 0.178 1018.154 0.673 1.843 popdens -0.377 0.503 5074.414 0.478 3.888 pathstress -0.119 0.391 4055.907 0.532 2.841 exogamy -0.927 4.464 13709.866 0.035 1.478 ncmallow -0.137 0.387 925.286 0.534 1.682 famsize 0.297 0.335 3180.405 0.563 2.211 settype -0.532 2.115 3169.088 0.146 4.289 localjh -0.572 0.326 1678.280 0.568 1.889 superjh -0.080 0.019 1443.457 0.890 2.719 moralgods 0.162 0.079 274.976 0.779 2.265 fempower 0.302 1.043 219.597 0.308 1.380 femsubs 0.932 5.737 16349.251 0.017 1.880 sexratio -0.369 0.152 114.773 0.697 1.385 war -0.104 1.804 356.775 0.180 1.396 himilexp 1.149 1.009 712.762 0.315 1.621 money 0.369 0.700 6880.529 0.403 2.362 wagelabor -0.959 1.862 85.903 0.176 1.537 migr 0.352 0.092 248.280 0.762 1.567 brideprice -0.927 0.559 8133.950 0.455 2.032 nuclearfam -0.886 0.383 2955.185 0.536 2.374 pctFemPolyg 0.003 0.013 278.994 0.908 1.831 > r2

```R2:final model R2:IV(distance) R2:IV(language)
0.2735421       0.9259708       0.9666807
```

> ccc

```                Fstat         df pvalue
```

RESET 2.619 159.018 0.108 Wald on restrs. 0.286 200.401 0.594 NCV -0.015 190.566 1.000 SWnormal 0.172 1898.653 0.678 lagll 3.103 530583.685 0.078 lagdd 4.352 402962.502 0.037 Programs: Copy and paste new xR<- Amanda's finished project

These 2 programs took 48 seconds on my computer Doug 08:57, 1 October 2009 (PDT) Program 1 --> Program 2

1. MI--make the imputed datasets
2. --change the following path to the directory with your data and program--

setwd("C:/My Documents/MI")) rm(list=ls(all=TRUE)) options(echo=TRUE)

1. --you need the following two packages--you must install them first--

library(foreign) library(mice) library(tripak) library(zoo) library(sp) library(maptools) library(spam)

1. --To find the citation for a package, use this function:---

citation("mice")

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in auxiliary variables---

1. --Read in the SCCS dataset---

1. --look at first 6 rows of vaux--

1. --look at field names of vaux--

names(vaux)

1. --check to see that rows are properly aligned in the two datasets--
2. --sum should equal 186---

sum((SCCS\$socname==vaux\$socname)*1)

1. --remove the society name field--

vaux<-vaux[,-28] names(vaux)

1. --Two nominal variables: brg and rlg----
2. --brg: consolidated Burton Regions-----
3. 0 = (rest of world) circumpolar, South and Meso-America, west North America
4. 1 = Subsaharan Africa
5. 2 = Middle Old World
6. 3 = Southeast Asia, Insular Pacific, Sahul
7. 4 = Eastern Americas
8. --rlg: Religion---
9. '0 (no world religion)'
10. '1 (Christianity)'
11. '2 (Islam)'
12. '3 (Hindu/Buddhist)'
1. --check to see number of missing values in vaux,
2. --whether variables are numeric,
3. --and number of discrete values for each variable---

vvn<-names(vaux) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(vaux[,vvn[i]]))) numeric<-is.numeric(vaux[,vvn[i]]) numDiscrVals<-length(table(vaux[,vvn[i]])) pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals)) } row.names(pp)<-vvn pp

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --extract variables to be used from SCCS, put in dataframe fx--

fx<-data.frame( socname=SCCS\$socname,socID=SCCS\$"sccs#", valchild=(SCCS\$v473+SCCS\$v474+SCCS\$v475+SCCS\$v476), cultints=SCCS\$v232,roots=(SCCS\$v233==5)*1, cereals=(SCCS\$v233==6)*1,gath=SCCS\$v203,hunt=SCCS\$v204, fish=SCCS\$v205,anim=SCCS\$v206,femsubs=SCCS\$v890, pigs=(SCCS\$v244==2)*1,milk=(SCCS\$v245>1)*1,plow=(SCCS\$v243>1)*1, bovines=(SCCS\$v244==7)*1,tree=(SCCS\$v233==4)*1, foodtrade=SCCS\$v819,foodscarc=SCCS\$v1685, ecorich=SCCS\$v857,popdens=SCCS\$v156,pathstress=SCCS\$v1260, CVrain=SCCS\$v1914/SCCS\$v1913,rain=SCCS\$v854,temp=SCCS\$v855, AP1=SCCS\$v921,AP2=SCCS\$v928,ndrymonth=SCCS\$v196, exogamy=SCCS\$v72,ncmallow=SCCS\$v227,famsize=SCCS\$v80, settype=SCCS\$v234,localjh=(SCCS\$v236-1),superjh=SCCS\$v237, moralgods=SCCS\$v238,fempower=SCCS\$v663, sexratio=1+(SCCS\$v1689>85)+(SCCS\$v1689>115), war=SCCS\$v1648,himilexp=(SCCS\$v899==1)*1, money=SCCS\$v155,wagelabor=SCCS\$v1732, migr=(SCCS\$v677==2)*1,brideprice=(SCCS\$v208==1)*1, nuclearfam=(SCCS\$v210<=3)*1,pctFemPolyg=SCCS\$v872 )

1. --look at first 6 rows of fx--

1. --check to see number of missing values--
2. --also check whether numeric--

vvn<-names(fx) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(fx[,vvn[i]]))) numeric<-is.numeric(fx[,vvn[i]]) pp<-rbind(pp,cbind(nmiss,data.frame(numeric))) } row.names(pp)<-vvn pp

1. --identify variables with missing values--

z<-which(pp[,1]>0) zv1<-vvn[z] zv1

1. --identify variables with non-missing values--

z<-which(pp[,1]==0) zv2<-vvn[z] zv2

1. -----------------------------
2. ----Multiple imputation------
3. -----------------------------
1. --number of imputed data sets to create--

nimp<-10

1. --one at a time, loop through those variables with missing values--

for (i in 1:length(zv1)){

1. --attach the imputand to the auxiliary data--

zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))

1. --in the following line, the imputation is done--

aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")

1. --during first iteration of the loop, create dataframe impdat--

if (i==1){ impdat<-data.frame(aqq[,c(".id",".imp")]) }

1. --the imputand is placed as a field in impdat and named--

impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)])) names(impdat)[NCOL(impdat)]<-zv1[i] }

1. --now the non-missing variables are attached to impdat--

gg<-NULL for (i in 1:nimp){ gg<-rbind(gg,data.frame(fx[,zv2])) } impdat<-cbind(impdat,gg)

1. --take a look at the top 6 and bottom 6 rows of impdat--

1. --impdat is saved as an R-format data file--

save(impdat,file="impdat.Rdata")

Program 2

1. MI--estimate model with network-lagged dependent variables, combine results

rm(list=ls(all=TRUE))

1. --Set path to your directory with data and program--

setwd("C:/My Documents/MI") options(echo=TRUE)

1. --need these packages for estimation and diagnostics--

library(foreign) library(spdep) library(car) library(lmtest) library(sandwich)

1. MAC

library(foreign) library(mice) library(tripak) library(zoo) library(sp) library(maptools) library(spam)

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in original SCCS data---

1. --Read in two weight matrices--

1. --Read in the imputed dataset---

1. HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
2. --create dep.varb. you wish to use from SCCS data--
3. --Here we sum variables measuring how much a society values children--
4. --can replace "sum" with "max"

depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum)

1. --find obs. for which dep. varb. is non-missing--

zdv<-which(!is.na(depvar)) depvar<-depvar[zdv]

1. HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED

depvarname<-"child value"

1. --can add additional SCCS variable, but only if it has no missing values---

dateobs<-SCCS\$v838 dateobs<-dateobs[zdv]

1. --look at frequencies and quartiles for the dep. varb.--

summary(depvar) table(depvar)

1. --modify weight matrices---
2. --set diagonal equal to zeros--

diag(ll)<-0 diag(dd)<-0

1. --use only obs. where dep. varb. non-missing--

ll<-ll[zdv,zdv] dd<-dd[zdv,zdv]

1. --row standardize (rows sum to one)

ll<-ll/rowSums(ll) dd<-dd/rowSums(dd)

1. --make weight matrix object for later autocorrelation test--

wmatll<-mat2listw(as.matrix(ll)) wmatdd<-mat2listw(as.matrix(dd))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT

indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods", "fempower","sexratio","war","himilexp","wagelabor","famsize","settype", "localjh","money","cultints","roots","cereals","gath","hunt","fish", "anim","pigs","milk","plow","bovines","tree","foodtrade", "ndrymonth","ecorich","popdens","pathstress","CVrain","rain", "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg")

1. -----------------------------------------------------
2. ---Estimate model on each imputed dataset------------
3. -----------------------------------------------------
1. --number of imputed datasets--

nimp<-10

1. --will append values to these empty objects--

vif<-NULL ss<-NULL beta<-NULL dng<-NULL

1. --loop through the imputed datasets--

for (i in 1:nimp){

1. --select the ith imputed dataset--

m9<-impdat[which(impdat\$.imp==i),]

1. --retain only obs. for which dep. varb. is nonmissing--

m9<-m9[zdv,]

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --create spatially lagged dep. varbs. in stage 1 OLS--

y<-as.matrix(depvar) xx<-as.matrix(m9[,indpv])

1. --for instruments we use the spatial lag of our indep. varbs.--
2. --First, the spatially lagged varb. for distance--

xdy<-dd%*%xx cyd<-dd%*%y o<-lm(cyd~xdy)

1. --the fitted value is our instrumental variable--

fydd<-fitted(o)

1. --keep R2 from this regression--

dr2<-summary(o)\$r.squared

1. --Then, the spatially lagged varb. for language--

xly<-ll%*%xx cyl<-ll%*%y o<-lm(cyl~xly)

1. --the fitted value is our instrumental variable--

fyll<-fitted(o)

1. --keep R2 from this regression--

lr2<-summary(o)\$r.squared m9<-cbind(m9,fydd,fyll)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of unrestricted model--

xUR<-lm(depvar~fyll+fydd+dateobs+ cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg ,data=m9)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of restricted model--

xR<-lm(depvar ~ fyll + cultints + roots + fish +

```   exogamy + settype + femsubs, data = m9)
```
1. --corrected sigma2 and R2 for 2SLS--

qxx<-m9 qxx[,"fydd"]<-cyd qxx[,"fyll"]<-cyl b<-coef(xR) incpt<-matrix(1,NROW(qxx),1) x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]])) e<-y-x%*%as.matrix(b) cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x))) cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

1. --collect coefficients and their variances--

ov<-summary(xR) vif<-rbind(vif,vif(xR)) ss<-rbind(ss,diag(ov\$cov*cs2))

1. --collect robust coef. variances when there is heteroskedasticity--
2. eb<-e^2
3. x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
4. hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
5. ss<-rbind(ss,diag(hcm))

beta<-rbind(beta,coef(xR))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --collect some model diagnostics--

dropt<-c("cereals","gath","plow","hunt","anim","dateobs", "pigs","milk","bovines","foodscarc","ecorich", "popdens","pathstress","ncmallow","famsize","localjh", "superjh","moralgods","fempower","sexratio","money", "fydd","wagelabor","war","himilexp","tree","foodtrade")

1. --Ramsey RESET test--

p1<-qchisq(resettest(xR,type="fitted")\$"p.value",1,lower.tail=FALSE)

1. --Wald test (H0: dropped variables have coefficient equal zero)--

o<-linear.hypothesis(xUR,dropt,test="Chisq")\$"Pr(>Chisq)"[2] p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue

1. --Heteroskedasticity test (H0: homoskedastic residuals)--

p3<-ncv.test(xR)\$ChiSquare

1. --Shapiro-Wilke normality test (H0: residuals normal)

p4<-qchisq(shapiro.test(e)\$p.value,1,lower.tail=FALSE)

1. --LaGrange Multiplier test for spatial autocorrelation: language--

o<-lm.LMtests(xR, wmatll, test=c("LMlag")) p5<-as.numeric(o\$LMlag\$statistic)

1. --LaGrange Multiplier test for spatial autocorrelation: distance--

o<-lm.LMtests(xR, wmatdd, test=c("LMlag")) p6<-as.numeric(o\$LMlag\$statistic)

1. --model R2--

p7<-cr2 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

1. --------------------------------------------
2. --Rubin's formulas for combining estimates--
3. --------------------------------------------
1. --first find final regr. coefs. and p-values--

mnb<-apply(beta,2,mean) vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1) mnv<-apply(ss,2,mean) vrT<-mnv+vrb*(1-nimp^(-1)) fst<-mnb^2/vrT r<-(1+nimp^(-1))*vrb/mnv v<-(nimp-1)*(1+r^(-1))^2 pval<-pf(fst,1,v,lower.tail=FALSE) bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) bbb\$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3) names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

1. --Then combine the diagnostics we collected--

dng<-data.frame(dng) names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd", "R2:final model","R2:IV(distance)","R2:IV(language)") r2<-apply(dng[,7:9],2,mean) adng<-dng[,1:6] mdm<-apply(adng,2,mean) vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1) aa<-4*mdm^2-2*vrd aa[which(aa<0)]<-0 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5) vd<-(nimp-1)*(1+rd^(-1))^2 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)

1. -All chi-sq we collected have df=1-------

pvald<-pf(Dm,1,vd,lower.tail=FALSE) ccc<-data.frame(round(cbind(Dm,vd,pvald),3)) names(ccc)<-c("Fstat","df","pvalue")

aaa<-c(table(depvar), NROW(depvar),depvarname) nn aaa bbb r2 ccc

1. --write results to csv file for perusal in spreadsheet--

write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE) write.csv(bbb,file="OLSresults.csv",append=TRUE) write.csv(r2,file="OLSresults.csv",append=TRUE) write.csv(ccc,file="OLSresults.csv",append=TRUE)  Results

>aaa

```          8            10            12            14            16
"2"           "2"           "1"           "3"           "6"
17            18            19            20            22
"1"          "12"           "2"          "25"          "19"
23            24            26            28            30
"1"          "28"          "18"          "19"           "7"
32            34            36
"17"           "2"           "6"         "171" "child value"
coef Fstat          ddf pvalue   VIF
```

(Intercept) -9.986 0.793 732145.4 0.373 NA fyll 1.398 8.026 717925.7 0.005 1.321 cultints 0.793 5.648 288306954.8 0.017 1.898 roots -2.289 3.985 282059404.4 0.046 1.210 fish 0.577 5.292 1175554781.4 0.021 1.239 exogamy -0.973 6.541 82512394.7 0.011 1.132 settype -0.450 4.021 310791641.7 0.045 1.685 femsubs 0.632 4.061 380925164.6 0.044 1.241 > r2

```R2:final model R2:IV(distance) R2:IV(language)
0.1067222       0.9255103       0.9668244
```

> ccc

```               Fstat           df pvalue
```

RESET 0.682 2.344494e+06 0.409 Wald on restrs. 0.460 3.657830e+02 0.498 NCV 1.089 6.791347e+06 0.297 SWnormal 0.680 7.473392e+08 0.410 lagll 1.671 1.469917e+06 0.196 lagdd 3.376 1.306131e+07 0.066

A| this works! GOOD START HERE FOR OTHER PROJECTS Expanded xR to full xUR- ALL indvars w/ depvar="rape" JUST CHANGE YOUR DEPVAR

Program 1 --> Program 2

1. MI--make the imputed datasets
2. --change the following path to the directory with your data and program--

setwd("C:/My Documents/MI") rm(list=ls(all=TRUE)) options(echo=TRUE)

1. --you need the following two packages--you must install them first--

library(foreign) library(mice) library(tripak) library(zoo) library(sp) library(maptools) library(spam)

1. --To find the citation for a package, use this function:---

citation("mice")

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in auxiliary variables---

1. --Read in the SCCS dataset---

1. --look at first 6 rows of vaux--

1. --look at field names of vaux--

names(vaux)

1. --check to see that rows are properly aligned in the two datasets--
2. --sum should equal 186---

sum((SCCS\$socname==vaux\$socname)*1)

1. --remove the society name field--

vaux<-vaux[,-28] names(vaux)

1. --Two nominal variables: brg and rlg----
2. --brg: consolidated Burton Regions-----
3. 0 = (rest of world) circumpolar, South and Meso-America, west North America
4. 1 = Subsaharan Africa
5. 2 = Middle Old World
6. 3 = Southeast Asia, Insular Pacific, Sahul
7. 4 = Eastern Americas
8. --rlg: Religion---
9. '0 (no world religion)'
10. '1 (Christianity)'
11. '2 (Islam)'
12. '3 (Hindu/Buddhist)'
1. --check to see number of missing values in vaux,
2. --whether variables are numeric,
3. --and number of discrete values for each variable---

vvn<-names(vaux) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(vaux[,vvn[i]]))) numeric<-is.numeric(vaux[,vvn[i]]) numDiscrVals<-length(table(vaux[,vvn[i]])) pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals)) } row.names(pp)<-vvn pp

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --extract variables to be used from SCCS, put in dataframe fx--

fx<-data.frame( socname=SCCS\$socname,socID=SCCS\$"sccs#", valchild=(SCCS\$v473+SCCS\$v474+SCCS\$v475+SCCS\$v476), dateobs=SCCS\$v838,cultints=SCCS\$v232,roots=(SCCS\$v233==5)*1, cereals=(SCCS\$v233==6)*1,gath=SCCS\$v203,hunt=SCCS\$v204, fish=SCCS\$v205,anim=SCCS\$v206,femsubs=SCCS\$v890, pigs=(SCCS\$v244==2)*1,milk=(SCCS\$v245>1)*1,plow=(SCCS\$v243>1)*1, bovines=(SCCS\$v244==7)*1,tree=(SCCS\$v233==4)*1, foodtrade=SCCS\$v819,foodscarc=SCCS\$v1685, ecorich=SCCS\$v857,popdens=SCCS\$v156,pathstress=SCCS\$v1260, CVrain=SCCS\$v1914/SCCS\$v1913,rain=SCCS\$v854,temp=SCCS\$v855, AP1=SCCS\$v921,AP2=SCCS\$v928,ndrymonth=SCCS\$v196, exogamy=SCCS\$v72,ncmallow=SCCS\$v227, ### famsize=SCCS\$v80, settype=SCCS\$v234,localjh=(SCCS\$v236-1),superjh=SCCS\$v237, moralgods=SCCS\$v238,fempower=SCCS\$v663, sexratio=1+(SCCS\$v1689>85)+(SCCS\$v1689>115), war=SCCS\$v1648,himilexp=(SCCS\$v899==1)*1, money=SCCS\$v155,wagelabor=SCCS\$v1732, migr=(SCCS\$v677==2)*1,brideprice=(SCCS\$v208==1)*1, nuclearfam=(SCCS\$v210<=3)*1,pctFemPolyg=SCCS\$v872, nonmatrel=SCCS\$v52,lrgfam=SCCS\$v68,malesexag=SCCS\$v175, segadlboys=SCCS\$v242,agrlateboy=SCCS\$v300) ###ADDED

1. --look at first 6 rows of fx--

1. --check to see number of missing values--
2. --also check whether numeric--

vvn<-names(fx) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(fx[,vvn[i]]))) numeric<-is.numeric(fx[,vvn[i]]) pp<-rbind(pp,cbind(nmiss,data.frame(numeric))) } row.names(pp)<-vvn pp

1. --identify variables with missing values--

z<-which(pp[,1]>0) zv1<-vvn[z] zv1

1. --identify variables with non-missing values--

z<-which(pp[,1]==0) zv2<-vvn[z] zv2

1. -----------------------------
2. ----Multiple imputation------
3. -----------------------------
1. --number of imputed data sets to create--

nimp<-10

1. --one at a time, loop through those variables with missing values--

for (i in 1:length(zv1)){

1. --attach the imputand to the auxiliary data--

zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))

1. --in the following line, the imputation is done--

aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")

1. --during first iteration of the loop, create dataframe impdat--

if (i==1){ impdat<-data.frame(aqq[,c(".id",".imp")]) }

1. --the imputand is placed as a field in impdat and named--

impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)])) names(impdat)[NCOL(impdat)]<-zv1[i] }

1. --now the non-missing variables are attached to impdat--

gg<-NULL for (i in 1:nimp){ gg<-rbind(gg,data.frame(fx[,zv2])) } impdat<-cbind(impdat,gg)

1. --take a look at the top 6 and bottom 6 rows of impdat--

1. --impdat is saved as an R-format data file--

save(impdat,file="impdat.Rdata")

Program 2

1. MI--estimate model with network-lagged dependent variables, combine results

rm(list=ls(all=TRUE))

1. --Set path to your directory with data and program--

setwd("C:/My Documents/MI") options(echo=TRUE)

1. --need these packages for estimation and diagnostics--

library(foreign) library(spdep) library(car) library(lmtest) library(sandwich)

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in original SCCS data---

1. --Read in two weight matrices--

1. --Read in the imputed dataset---

1. HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
2. --create dep.varb. you wish to use from SCCS data--
3. --Here we sum variables measuring how much a society values children--
4. --can replace "sum" with "max"
1. depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum)

depvar<-SCCS\$v667###NEW

1. --find obs. for which dep. varb. is non-missing--

zdv<-which(!is.na(depvar)) depvar<-depvar[zdv]

1. HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
1. depvarname<-"childvar"

depvarname<-"rape"

1. --can add additional SCCS variable, but only if it has no missing values---
2. dateobs<-SCCS\$v838
3. dateobs<-dateobs[zdv]
1. --look at frequencies and quartiles for the dep. varb.--

summary(depvar) table(depvar)

1. --modify weight matrices---
2. --set diagonal equal to zeros--

diag(ll)<-0 diag(dd)<-0

1. --use only obs. where dep. varb. non-missing--

ll<-ll[zdv,zdv] dd<-dd[zdv,zdv]

1. --row standardize (rows sum to one)

ll<-ll/rowSums(ll) dd<-dd/rowSums(dd)

1. --make weight matrix object for later autocorrelation test--

wmatll<-mat2listw(as.matrix(ll)) wmatdd<-mat2listw(as.matrix(dd))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT

indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods", "fempower","sexratio","war","himilexp","wagelabor","settype", #1# "famsize", "localjh","money","cultints","roots","cereals","gath","hunt","fish", "anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs", "ndrymonth","ecorich","popdens","pathstress","CVrain","rain", "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg", "nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

1. -----------------------------------------------------
2. ---Estimate model on each imputed dataset------------
3. -----------------------------------------------------
1. --number of imputed datasets--

nimp<-10

1. --will append values to these empty objects--

vif<-NULL ss<-NULL beta<-NULL dng<-NULL

1. --loop through the imputed datasets--

for (i in 1:nimp){

1. --select the ith imputed dataset--

m9<-impdat[which(impdat\$.imp==i),]

1. --retain only obs. for which dep. varb. is nonmissing--

m9<-m9[zdv,]

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --create spatially lagged dep. varbs. in stage 1 OLS--

y<-as.matrix(depvar) xx<-as.matrix(m9[,indpv])

1. --for instruments we use the spatial lag of our indep. varbs.--
2. --First, the spatially lagged varb. for distance--

xdy<-dd%*%xx cyd<-dd%*%y o<-lm(cyd~xdy)

1. --the fitted value is our instrumental variable--

fydd<-fitted(o)

1. --keep R2 from this regression--

dr2<-summary(o)\$r.squared

1. --Then, the spatially lagged varb. for language--

xly<-ll%*%xx cyl<-ll%*%y o<-lm(cyl~xly)

1. --the fitted value is our instrumental variable--

fyll<-fitted(o)

1. --keep R2 from this regression--

lr2<-summary(o)\$r.squared m9<-cbind(m9,fydd,fyll)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of unrestricted model--

xUR<-lm(depvar~fyll+fydd+dateobs+ cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+ ### famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg+ nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of restricted model--
1. xR<-lm(depvar ~ fyll + cultints + roots + fish +
2. exogamy + settype + femsubs, data = m9)
```  xR<-lm(depvar~fyll+fydd+dateobs+
```

cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+ ###famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy , data = m9) ###ADDED

1. --corrected sigma2 and R2 for 2SLS--

qxx<-m9 qxx[,"fydd"]<-cyd qxx[,"fyll"]<-cyl b<-coef(xR) incpt<-matrix(1,NROW(qxx),1) x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]])) e<-y-x%*%as.matrix(b) cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x))) cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

1. --collect coefficients and their variances--

ov<-summary(xR) vif<-rbind(vif,vif(xR)) ss<-rbind(ss,diag(ov\$cov*cs2))

1. --collect robust coef. variances when there is heteroskedasticity--
2. eb<-e^2
3. x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
4. hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
5. ss<-rbind(ss,diag(hcm))

beta<-rbind(beta,coef(xR))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --collect some model diagnostics--

dropt<-c("cereals","gath","plow","hunt","anim","dateobs", "pigs","milk","bovines","foodscarc","ecorich","localjh", #1# "famsize", "superjh","moralgods","fempower","sexratio","money", "fydd","wagelabor","war","himilexp","tree","foodtrade")

1. --Ramsey RESET test--

p1<-qchisq(resettest(xR,type="fitted")\$"p.value",1,lower.tail=FALSE)

1. --Wald test (H0: dropped variables have coefficient equal zero)--

o<-linear.hypothesis(xUR,dropt,test="Chisq")\$"Pr(>Chisq)"[2] p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue

1. --Heteroskedasticity test (H0: homoskedastic residuals)--

p3<-ncv.test(xR)\$ChiSquare

1. --Shapiro-Wilke normality test (H0: residuals normal)

p4<-qchisq(shapiro.test(e)\$p.value,1,lower.tail=FALSE)

1. --LaGrange Multiplier test for spatial autocorrelation: language--

o<-lm.LMtests(xR, wmatll, test=c("LMlag")) p5<-as.numeric(o\$LMlag\$statistic)

1. --LaGrange Multiplier test for spatial autocorrelation: distance--

o<-lm.LMtests(xR, wmatdd, test=c("LMlag")) p6<-as.numeric(o\$LMlag\$statistic)

1. --model R2--

p7<-cr2 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

1. --------------------------------------------
2. --Rubin's formulas for combining estimates--
3. --------------------------------------------
1. --first find final regr. coefs. and p-values--

mnb<-apply(beta,2,mean) vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1) mnv<-apply(ss,2,mean) vrT<-mnv+vrb*(1-nimp^(-1)) fst<-mnb^2/vrT r<-(1+nimp^(-1))*vrb/mnv v<-(nimp-1)*(1+r^(-1))^2 pval<-pf(fst,1,v,lower.tail=FALSE) bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) bbb\$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3) names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

1. --Then combine the diagnostics we collected--

dng<-data.frame(dng) names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd", "R2:final model","R2:IV(distance)","R2:IV(language)") r2<-apply(dng[,7:9],2,mean) adng<-dng[,1:6] mdm<-apply(adng,2,mean) vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1) aa<-4*mdm^2-2*vrd aa[which(aa<0)]<-0 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5) vd<-(nimp-1)*(1+rd^(-1))^2 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)

1. -All chi-sq we collected have df=1-------

pvald<-pf(Dm,1,vd,lower.tail=FALSE) ccc<-data.frame(round(cbind(Dm,vd,pvald),3)) names(ccc)<-c("Fstat","df","pvalue")

aaa<-c(table(depvar), NROW(depvar),depvarname) nn aaa bbb r2 ccc

1. Corrected to publication version with depvarname
2. --write results to csv file for perusal in spreadsheet--

write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE) write.csv(bbb,file="OLSresults.csv",append=TRUE) write.csv(r2,file="OLSresults.csv",append=TRUE) write.csv(ccc,file="OLSresults.csv",append=TRUE) Results for rape667

```   1      2
"45"   "50"   "95" "rape667"  bbb
coef Fstat       ddf pvalue    VIF
```

(Intercept) 2.692 0.456 3574.894 0.500 NA fyll -0.748 0.302 1465.228 0.583 3.636 fydd 0.445 0.249 405.432 0.618 3.195 dateobs 0.000 0.095 922.640 0.758 3.428 cultints 0.071 0.552 4263.890 0.457 8.996 <-- delete roots 0.277 0.442 1219.314 0.506 9.332 <-- delete cereals 0.384 0.802 485.937 0.371 12.589 <-- delete gath 0.020 0.065 928.673 0.799 4.712 plow 0.246 0.439 2010.000 0.508 4.469 hunt 0.063 0.551 1701.610 0.458 7.602 <-- delete fish -0.016 0.049 568.527 0.825 5.958 <-- delete anim 0.053 0.421 1431.601 0.516 10.204 <-- delete pigs -0.220 0.642 1312.112 0.423 3.793 milk 0.024 0.004 1155.939 0.947 7.002 <-- delete bovines -0.237 0.507 1196.446 0.477 6.130 <-- delete tree 0.527 1.103 617.319 0.294 6.526 <-- delete foodtrade 0.005 0.433 537.598 0.511 1.760 foodscarc 0.026 0.164 113.051 0.687 1.609 ecorich -0.028 0.130 19494.353 0.719 3.338 popdens -0.016 0.037 994.890 0.847 5.240 pathstress -0.027 0.735 346.147 0.392 3.895 exogamy -0.029 0.180 23463.156 0.672 2.209 ncmallow -0.024 0.501 1868.114 0.479 2.101 settype 0.039 0.357 514.253 0.551 7.733 localjh -0.039 0.055 1032.192 0.815 2.958 superjh -0.163 2.709 3051.499 0.100 3.665 moralgods -0.062 0.524 505.639 0.469 2.631 fempower 0.016 0.107 117.215 0.745 2.059 femsubs 0.017 0.062 2097.553 0.803 3.311 sexratio -0.130 0.886 261.329 0.347 2.138 war 0.009 0.450 2760.180 0.502 2.796 himilexp -0.067 0.149 3084.584 0.700 2.403 money 0.065 0.928 1594.700 0.336 2.668 wagelabor 0.046 0.246 643.645 0.620 2.049 migr 0.105 0.214 43.800 0.646 2.379 brideprice -0.042 0.039 914.626 0.843 3.359 nuclearfam -0.326 1.505 220.816 0.221 4.318 pctFemPolyg 0.003 0.531 243.410 0.467 2.741 nonmatrel 0.055 0.299 467.928 0.585 1.888 lrgfam -0.024 0.860 17608.647 0.354 2.888 malesexag 0.093 2.392 82.357 0.126 2.024 segadlboys 0.000 0.000 464.120 0.993 2.198 agrlateboy -0.019 0.170 4072.023 0.680 2.466 > r2

```R2:final model R2:IV(distance) R2:IV(language)
0.3797343       0.9304957       0.9445135
```

> ccc

```                Fstat         df pvalue
```

RESET 2.281 110.084 0.134 Wald on restrs. -0.002 232584.513 1.000 NCV 0.439 119.020 0.509 SWnormal 2.521 148.821 0.114 lagll 0.933 17759.718 0.334 lagdd 1.148 179491.669 0.284

A| delete High VIFs for 667 rape==this works! GOOD START HERE FOR OTHER PROJECTS Expanded xR to full xUR- ALL indvars w/ depvar="rape" JUST CHANGE YOUR DEPVAR

Program 1 --> Program 2

1. MI--make the imputed datasets
2. --change the following path to the directory with your data and program--

setwd("C:/My Documents/MI") rm(list=ls(all=TRUE)) options(echo=TRUE)

1. --you need the following two packages--you must install them first--

library(foreign) library(mice) library(tripak) library(zoo) library(sp) library(maptools) library(spam)

1. --To find the citation for a package, use this function:---

citation("mice")

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in auxiliary variables---

1. --Read in the SCCS dataset---

1. --look at first 6 rows of vaux--

1. --look at field names of vaux--

names(vaux)

1. --check to see that rows are properly aligned in the two datasets--
2. --sum should equal 186---

sum((SCCS\$socname==vaux\$socname)*1)

1. --remove the society name field--

vaux<-vaux[,-28] names(vaux)

1. --Two nominal variables: brg and rlg----
2. --brg: consolidated Burton Regions-----
3. 0 = (rest of world) circumpolar, South and Meso-America, west North America
4. 1 = Subsaharan Africa
5. 2 = Middle Old World
6. 3 = Southeast Asia, Insular Pacific, Sahul
7. 4 = Eastern Americas
8. --rlg: Religion---
9. '0 (no world religion)'
10. '1 (Christianity)'
11. '2 (Islam)'
12. '3 (Hindu/Buddhist)'
1. --check to see number of missing values in vaux,
2. --whether variables are numeric,
3. --and number of discrete values for each variable---

vvn<-names(vaux) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(vaux[,vvn[i]]))) numeric<-is.numeric(vaux[,vvn[i]]) numDiscrVals<-length(table(vaux[,vvn[i]])) pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals)) } row.names(pp)<-vvn pp

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --extract variables to be used from SCCS, put in dataframe fx--

fx<-data.frame( socname=SCCS\$socname,socID=SCCS\$"sccs#", valchild=(SCCS\$v473+SCCS\$v474+SCCS\$v475+SCCS\$v476), dateobs=SCCS\$v838,cultints=SCCS\$v232,roots=(SCCS\$v233==5)*1, cereals=(SCCS\$v233==6)*1,gath=SCCS\$v203,hunt=SCCS\$v204, fish=SCCS\$v205,anim=SCCS\$v206,femsubs=SCCS\$v890, pigs=(SCCS\$v244==2)*1,milk=(SCCS\$v245>1)*1,plow=(SCCS\$v243>1)*1, bovines=(SCCS\$v244==7)*1,tree=(SCCS\$v233==4)*1, foodtrade=SCCS\$v819,foodscarc=SCCS\$v1685, ecorich=SCCS\$v857,popdens=SCCS\$v156,pathstress=SCCS\$v1260, CVrain=SCCS\$v1914/SCCS\$v1913,rain=SCCS\$v854,temp=SCCS\$v855, AP1=SCCS\$v921,AP2=SCCS\$v928,ndrymonth=SCCS\$v196, exogamy=SCCS\$v72,ncmallow=SCCS\$v227, ### famsize=SCCS\$v80, settype=SCCS\$v234,localjh=(SCCS\$v236-1),superjh=SCCS\$v237, moralgods=SCCS\$v238,fempower=SCCS\$v663, sexratio=1+(SCCS\$v1689>85)+(SCCS\$v1689>115), war=SCCS\$v1648,himilexp=(SCCS\$v899==1)*1, money=SCCS\$v155,wagelabor=SCCS\$v1732, migr=(SCCS\$v677==2)*1,brideprice=(SCCS\$v208==1)*1, nuclearfam=(SCCS\$v210<=3)*1,pctFemPolyg=SCCS\$v872, nonmatrel=SCCS\$v52,lrgfam=SCCS\$v68,malesexag=SCCS\$v175, segadlboys=SCCS\$v242,agrlateboy=SCCS\$v300) ###ADDED

1. --look at first 6 rows of fx--

1. --check to see number of missing values--
2. --also check whether numeric--

vvn<-names(fx) pp<-NULL for (i in 1:length(vvn)){ nmiss<-length(which(is.na(fx[,vvn[i]]))) numeric<-is.numeric(fx[,vvn[i]]) pp<-rbind(pp,cbind(nmiss,data.frame(numeric))) } row.names(pp)<-vvn pp

1. --identify variables with missing values--

z<-which(pp[,1]>0) zv1<-vvn[z] zv1

1. --identify variables with non-missing values--

z<-which(pp[,1]==0) zv2<-vvn[z] zv2

1. -----------------------------
2. ----Multiple imputation------
3. -----------------------------
1. --number of imputed data sets to create--

nimp<-10

1. --one at a time, loop through those variables with missing values--

for (i in 1:length(zv1)){

1. --attach the imputand to the auxiliary data--

zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))

1. --in the following line, the imputation is done--

aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")

1. --during first iteration of the loop, create dataframe impdat--

if (i==1){ impdat<-data.frame(aqq[,c(".id",".imp")]) }

1. --the imputand is placed as a field in impdat and named--

impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)])) names(impdat)[NCOL(impdat)]<-zv1[i] }

1. --now the non-missing variables are attached to impdat--

gg<-NULL for (i in 1:nimp){ gg<-rbind(gg,data.frame(fx[,zv2])) } impdat<-cbind(impdat,gg)

1. --take a look at the top 6 and bottom 6 rows of impdat--

1. --impdat is saved as an R-format data file--

save(impdat,file="impdat.Rdata")

Program 2

1. MI--estimate model with network-lagged dependent variables, combine results

rm(list=ls(all=TRUE))

1. --Set path to your directory with data and program--

setwd("C:/My Documents/MI") options(echo=TRUE)

1. --need these packages for estimation and diagnostics--

library(foreign) library(spdep) library(car) library(lmtest) library(sandwich)

1. -----------------------------
2. --Read in data, rearrange----
3. -----------------------------
1. --Read in original SCCS data---

1. --Read in two weight matrices--

1. --Read in the imputed dataset---

1. HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
2. --create dep.varb. you wish to use from SCCS data--
3. --Here we sum variables measuring how much a society values children--
4. --can replace "sum" with "max"
1. depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum)

depvar<-SCCS\$v667###NEW

1. --find obs. for which dep. varb. is non-missing--

zdv<-which(!is.na(depvar)) depvar<-depvar[zdv]

1. HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
1. depvarname<-"childvar"

depvarname<-"rape"

1. --can add additional SCCS variable, but only if it has no missing values---
2. dateobs<-SCCS\$v838
3. dateobs<-dateobs[zdv]
1. --look at frequencies and quartiles for the dep. varb.--

summary(depvar) table(depvar)

1. --modify weight matrices---
2. --set diagonal equal to zeros--

diag(ll)<-0 diag(dd)<-0

1. --use only obs. where dep. varb. non-missing--

ll<-ll[zdv,zdv] dd<-dd[zdv,zdv]

1. --row standardize (rows sum to one)

ll<-ll/rowSums(ll) dd<-dd/rowSums(dd)

1. --make weight matrix object for later autocorrelation test--

wmatll<-mat2listw(as.matrix(ll)) wmatdd<-mat2listw(as.matrix(dd))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT

indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods", "fempower","sexratio","war","himilexp","wagelabor","settype", #1# "famsize", "localjh","money","cultints","roots","cereals","gath","hunt","fish", "anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs", "ndrymonth","ecorich","popdens","pathstress","CVrain","rain", "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg", "nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

1. -----------------------------------------------------
2. ---Estimate model on each imputed dataset------------
3. -----------------------------------------------------
1. --number of imputed datasets--

nimp<-10

1. --will append values to these empty objects--

vif<-NULL ss<-NULL beta<-NULL dng<-NULL

1. --loop through the imputed datasets--

for (i in 1:nimp){

1. --select the ith imputed dataset--

m9<-impdat[which(impdat\$.imp==i),]

1. --retain only obs. for which dep. varb. is nonmissing--

m9<-m9[zdv,]

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --create spatially lagged dep. varbs. in stage 1 OLS--

y<-as.matrix(depvar) xx<-as.matrix(m9[,indpv])

1. --for instruments we use the spatial lag of our indep. varbs.--
2. --First, the spatially lagged varb. for distance--

xdy<-dd%*%xx cyd<-dd%*%y o<-lm(cyd~xdy)

1. --the fitted value is our instrumental variable--

fydd<-fitted(o)

1. --keep R2 from this regression--

dr2<-summary(o)\$r.squared

1. --Then, the spatially lagged varb. for language--

xly<-ll%*%xx cyl<-ll%*%y o<-lm(cyl~xly)

1. --the fitted value is our instrumental variable--

fyll<-fitted(o)

1. --keep R2 from this regression--

lr2<-summary(o)\$r.squared m9<-cbind(m9,fydd,fyll)

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of unrestricted model--

xUR<-lm(depvar~fyll+fydd+dateobs+ cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+ ### famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg+ nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --Stage 2 OLS estimate of restricted model--
1. xR<-lm(depvar ~ fyll + cultints + roots + fish +
2. exogamy + settype + femsubs, data = m9)
```  xR<-lm(depvar~fyll+fydd+dateobs+
```

plow+ # cultints+roots+cereals+gath+ pigs+foodtrade+foodscarc+ #hunt+fish+anim+ ecorich+popdens+pathstress+exogamy+ncmallow+ ###famsize+milk+bovines+tree+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor+ migr+brideprice+nuclearfam+pctFemPolyg +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy , data = m9) ###ADDED

1. Nov9# cultints 0.071 0.552 4263.890 0.457 8.996 <-- delete
2. Nov9# roots 0.277 0.442 1219.314 0.506 9.332 <-- delete
3. Nov9# cereals 0.384 0.802 485.937 0.371 12.589 <-- delete
4. Nov9# hunt 0.063 0.551 1701.610 0.458 7.602 <-- delete
5. Nov9# fish -0.016 0.049 568.527 0.825 5.958 <-- delete
6. Nov9# anim 0.053 0.421 1431.601 0.516 10.204 <-- delete
7. Nov9# milk 0.024 0.004 1155.939 0.947 7.002 <-- delete
8. Nov9# bovines -0.237 0.507 1196.446 0.477 6.130 <-- delete
9. Nov9# tree 0.527 1.103 617.319 0.294 6.526 <-- delete
10. --corrected sigma2 and R2 for 2SLS--

qxx<-m9 qxx[,"fydd"]<-cyd qxx[,"fyll"]<-cyl b<-coef(xR) incpt<-matrix(1,NROW(qxx),1) x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]])) e<-y-x%*%as.matrix(b) cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x))) cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

1. --collect coefficients and their variances--

ov<-summary(xR) vif<-rbind(vif,vif(xR)) ss<-rbind(ss,diag(ov\$cov*cs2))

1. --collect robust coef. variances when there is heteroskedasticity--
2. eb<-e^2
3. x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
4. hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
5. ss<-rbind(ss,diag(hcm))

beta<-rbind(beta,coef(xR))

1. MODIFY THESE STATEMENTS FOR A NEW PROJECT
2. --collect some model diagnostics--

dropt<-c("cereals","gath","plow","hunt","anim","dateobs", "pigs","milk","bovines","foodscarc","ecorich","localjh", #1# "famsize", "superjh","moralgods","fempower","sexratio","money", "fydd","wagelabor","war","himilexp","tree","foodtrade")

1. --Ramsey RESET test--

p1<-qchisq(resettest(xR,type="fitted")\$"p.value",1,lower.tail=FALSE)

1. --Wald test (H0: dropped variables have coefficient equal zero)--

o<-linear.hypothesis(xUR,dropt,test="Chisq")\$"Pr(>Chisq)"[2] p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue

1. --Heteroskedasticity test (H0: homoskedastic residuals)--

p3<-ncv.test(xR)\$ChiSquare

1. --Shapiro-Wilke normality test (H0: residuals normal)

p4<-qchisq(shapiro.test(e)\$p.value,1,lower.tail=FALSE)

1. --LaGrange Multiplier test for spatial autocorrelation: language--

o<-lm.LMtests(xR, wmatll, test=c("LMlag")) p5<-as.numeric(o\$LMlag\$statistic)

1. --LaGrange Multiplier test for spatial autocorrelation: distance--

o<-lm.LMtests(xR, wmatdd, test=c("LMlag")) p6<-as.numeric(o\$LMlag\$statistic)

1. --model R2--

p7<-cr2 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

1. --------------------------------------------
2. --Rubin's formulas for combining estimates--
3. --------------------------------------------
1. --first find final regr. coefs. and p-values--

mnb<-apply(beta,2,mean) vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1) mnv<-apply(ss,2,mean) vrT<-mnv+vrb*(1-nimp^(-1)) fst<-mnb^2/vrT r<-(1+nimp^(-1))*vrb/mnv v<-(nimp-1)*(1+r^(-1))^2 pval<-pf(fst,1,v,lower.tail=FALSE) bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) bbb\$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3) names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

1. --Then combine the diagnostics we collected--

dng<-data.frame(dng) names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd", "R2:final model","R2:IV(distance)","R2:IV(language)") r2<-apply(dng[,7:9],2,mean) adng<-dng[,1:6] mdm<-apply(adng,2,mean) vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1) aa<-4*mdm^2-2*vrd aa[which(aa<0)]<-0 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5) vd<-(nimp-1)*(1+rd^(-1))^2 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)

1. -All chi-sq we collected have df=1-------

pvald<-pf(Dm,1,vd,lower.tail=FALSE) ccc<-data.frame(round(cbind(Dm,vd,pvald),3)) names(ccc)<-c("Fstat","df","pvalue")

aaa<-c(table(depvar), NROW(depvar),depvarname) nn aaa bbb r2 ccc

1. Corrected to publication version with depvarname
2. --write results to csv file for perusal in spreadsheet--

write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE) write.csv(bbb,file="OLSresults.csv",append=TRUE) write.csv(r2,file="OLSresults.csv",append=TRUE) write.csv(ccc,file="OLSresults.csv",append=TRUE) Results

Still none significant... try a crosstab of v667 and v173 if they both measure rape by different criteria, try adding them for a new depvar perhaps.

```    1      2
"45"   "50"   "95" "rape"
```

> bbb

```             coef Fstat      ddf pvalue   VIF
```

(Intercept) 3.706 1.089 2695.997 0.297 NA fyll -0.357 0.089 339.352 0.766 2.795 fydd -0.042 0.004 1586.556 0.948 1.955 dateobs -0.001 0.513 749.486 0.474 3.066 plow 0.296 1.142 382.980 0.286 2.454 pigs -0.136 0.357 564.577 0.551 2.691 foodtrade 0.003 0.276 924.201 0.599 1.447 foodscarc 0.028 0.243 172.438 0.622 1.431 ecorich -0.058 0.704 4074.456 0.401 2.849 popdens 0.027 0.148 1691.357 0.700 4.160 pathstress -0.006 0.051 862.936 0.822 3.071 exogamy -0.031 0.250 3845.623 0.617 1.898 ncmallow -0.024 0.729 8503.102 0.393 1.682 settype 0.017 0.157 1562.520 0.692 3.891 localjh 0.043 0.098 3941.985 0.754 2.227 superjh -0.132 2.631 7049.960 0.105 2.705 moralgods -0.031 0.185 373.912 0.667 1.913 fempower -0.015 0.171 386.952 0.679 1.495 femsubs 0.042 0.539 1479.476 0.463 2.381 sexratio -0.073 0.364 144.494 0.547 1.659 war 0.006 0.205 1200.947 0.651 2.521 himilexp 0.086 0.334 757.361 0.563 1.809 money 0.019 0.109 4392.456 0.742 2.173 wagelabor 0.021 0.062 341.189 0.803 1.667 migr 0.206 1.282 55.543 0.262 1.796 brideprice -0.008 0.002 616.223 0.966 2.648 nuclearfam -0.235 0.996 281.301 0.319 3.710 pctFemPolyg 0.003 0.985 373.914 0.322 2.203 nonmatrel 0.090 1.109 459.656 0.293 1.434 lrgfam -0.017 0.534 7146.785 0.465 2.387 malesexag 0.073 1.777 62.285 0.187 1.700 segadlboys -0.007 0.022 1370.964 0.882 1.867 agrlateboy 0.003 0.005 4082.081 0.943 1.968 > r2

```R2:final model R2:IV(distance) R2:IV(language)
0.3094084       0.9304957       0.9445135
```

> ccc

```                Fstat         df pvalue
```

RESET 1.035 536.787 0.309 Wald on restrs. -0.002 232584.513 1.000 NCV 0.344 614.886 0.558 SWnormal 3.494 103.829 0.064 lagll 0.821 51348.220 0.365 lagdd 0.842 80434.608 0.359