EduMod-11: (your dep var here) Imputation and Regression

From InterSciWiki

Jump to: navigation, search

User:Amanda McDonald -- User talk:Amanda McDonald - this is a good copy of EduMod-3 or -4 as of 10/1/09--DRW. RUN THE MAIN PROGRAM AND THEN RUN THE WORKSHOP PROGRAM. Copy this entire file to your EduMod-X page as instructed.

Contents

[edit] Your workshop

Back to Imputing_data_for_Regression_Analysis#EduMod

Your main work will be done here after you get your first results with the (4 Programs: Copy and paste) program and paste them below. The demonstration program (Eff and Dow') shows only the final step not the intermediate ones. Here I post the last part of the program. Here you start again to re-estimate the model, this time substituting in xR= (final variables so that xR=xUR (all independent variables). That is done by copying the list of variables that you see after

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--

(but dont copy the xUR<-lm itself) down to replace the list of xR<-lm variables (the function lm(depvar~list of indep vars) is what does the multiple linear regression, lm="linear model". You will see I have ###commented out the original xR<-lm. Note that for your project rather than this demo you will want also to change your depvar and add to the list of independent variables, working from the codebook, using the variables numbers, like v860, and giving them names in program 1. You probably dont have to change the excluded variables under

#--collect some model diagnostics--

(NOTE: all program statements in the wiki must have a space in the first column)

Now, in a real project, the results you will get by setting xR==xUR need to culled into two sets: those with p<.05 and those to exclude. The analysis of excluded variables when properly specified (the two sets should really be mutually exclusive) will tell you whether you have omitted anything that should be excluded. Read EFF and DOW carefully for this step.

IMPORTANT: put both the FULL SET of results for all your xUR variables, which you will obtain here, and the CULLED SET of results for your xR variables, into the RESULTS section of this page so you and I can check whether you have done the analysis correctly.

[edit] Workshop program

Program 2: Modified for all xR=xUR except dateobs - should take about 40 seconds
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("c:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"Rape"###"child value"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
+ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy###ADDED
,data=m9)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
###THIS IS JUST PROGRAM 2 EXCEPT DOUG WHITE REDEFINED xR=same as xUR but deleted "dateobs," in line 2
 xR<-lm(depvar~fyll+fydd+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor+
 migr+brideprice+nuclearfam+pctFemPolyg
 +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy###ADDED
 ,data=m9)

#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] Results

[edit] From workshop

all xR=xUR except dateobs. Your next step if this were a new project would be to keep xR=only those with p<.05

bbb
              coef Fstat       ddf pvalue   VIF
(Intercept) -3.778 0.048  3518.772  0.827    NA
fyll         2.092 5.493 72408.940  0.019 4.311
fydd        -0.701 1.525 27950.568  0.217 3.570
cultints     1.090 3.768  6088.808  0.052 5.222
roots       -4.595 3.656  3015.253  0.056 5.091
cereals     -1.476 0.386  1699.391  0.534 7.508
gath        -0.492 0.925  3771.955  0.336 3.224
plow        -2.235 1.249  5935.504  0.264 3.178
hunt        -0.162 0.083  7873.798  0.773 5.378
fish         0.273 0.432  1534.074  0.511 3.200
anim        -0.138 0.079 22553.987  0.779 5.968
pigs         0.560 0.092  3241.095  0.762 2.233
milk        -1.520 0.681  5301.829  0.409 3.970
bovines      1.870 1.011  2777.386  0.315 4.355
tree        -5.090 2.826  9236.350  0.093 3.306
foodtrade    0.091 3.101  6431.573  0.078 1.613
foodscarc   -0.385 0.947   490.565  0.331 1.306
ecorich     -0.198 0.178  1018.154  0.673 1.843
popdens     -0.377 0.503  5074.414  0.478 3.888
pathstress  -0.119 0.391  4055.907  0.532 2.841
exogamy     -0.927 4.464 13709.866  0.035 1.478
ncmallow    -0.137 0.387   925.286  0.534 1.682
famsize      0.297 0.335  3180.405  0.563 2.211
settype     -0.532 2.115  3169.088  0.146 4.289
localjh     -0.572 0.326  1678.280  0.568 1.889
superjh     -0.080 0.019  1443.457  0.890 2.719
moralgods    0.162 0.079   274.976  0.779 2.265
fempower     0.302 1.043   219.597  0.308 1.380
femsubs      0.932 5.737 16349.251  0.017 1.880
sexratio    -0.369 0.152   114.773  0.697 1.385
war         -0.104 1.804   356.775  0.180 1.396
himilexp     1.149 1.009   712.762  0.315 1.621
money        0.369 0.700  6880.529  0.403 2.362
wagelabor   -0.959 1.862    85.903  0.176 1.537
migr         0.352 0.092   248.280  0.762 1.567
brideprice  -0.927 0.559  8133.950  0.455 2.032
nuclearfam  -0.886 0.383  2955.185  0.536 2.374
pctFemPolyg  0.003 0.013   278.994  0.908 1.831
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.2735421       0.9259708       0.9666807 
>  ccc
                 Fstat         df pvalue
RESET            2.619    159.018  0.108
Wald on restrs.  0.286    200.401  0.594
NCV             -0.015    190.566  1.000
SWnormal         0.172   1898.653  0.678
lagll            3.103 530583.685  0.078
lagdd            4.352 402962.502  0.037

[edit] Programs: Copy and paste

These 2 programs took 48 seconds on my computer Doug 08:57, 1 October 2009 (PDT)
Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED)

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"child value"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9)
###ADDED, 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
xR<-lm(depvar ~ fyll + cultints + roots + fish + 
    exogamy + settype + femsubs, data = m9)

#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| Bad results

>  bbb
              coef Fstat          ddf pvalue   VIF
(Intercept) -9.045 0.655     161730.7  0.418    NA
fyll         1.355 7.610     154787.5  0.006 1.311
cultints     0.798 5.741   70617938.1  0.017 1.897
roots       -2.277 3.949 2890971866.4  0.047 1.210
fish         0.576 5.283  192095699.8  0.022 1.239
exogamy     -0.956 6.358   19935608.7  0.012 1.126
settype     -0.445 3.938   30335788.9  0.047 1.683
femsubs      0.634 4.096  191200873.7  0.043 1.242
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.1082198       0.9343954       0.9704000 
>  ccc
                Fstat           df pvalue
RESET           0.604  1445741.938  0.437
Wald on restrs. 0.147       63.163  0.703
NCV             1.113  2052368.295  0.292
SWnormal        0.698 99565463.595  0.403
lagll           1.521   367252.484  0.217
lagdd           3.243  2508445.780  0.072
>  

[edit] A| Expand xR to xUR

These 2 programs took 48 seconds on my computer Doug 08:57, 1 October 2009 (PDT)
Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI"))
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED)

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"child value"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9)
###ADDED, 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar ~ fyll + cultints + roots + fish + 
       exogamy + settype + femsubs
  +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data = m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| Good Results! That last version got it right for CHILDVAL - We forgot to change the depvar

the less aggression in late boys childtraining, the more childval!!! - We forgot to change the depvar

SO WE WILL SAVE THIS AND DO the new depvar = "rape" !!!
> bbb
                  this was still for childval
              coef Fstat        ddf pvalue   VIF
(Intercept) -6.878 0.324  38118.106  0.569    NA
fyll         1.353 6.914  19158.330  0.009 1.429 <--signif
cultints     0.794 5.513  99512.981  0.019 1.947 <--signif
roots       -2.242 3.744 308582.611  0.053 1.236 <--signif
fish         0.494 3.626  36441.669  0.057 1.315 <--signif
exogamy     -1.004 6.750  97331.882  0.009 1.166 <--signif
settype     -0.525 4.930  22218.899  0.026 1.851 <--signif
femsubs      0.639 3.887  38985.857  0.049 1.319 <--signif
nonmatrel    0.040 0.004    160.181  0.950 1.096 DROP IN NEXT VERSION OF THE PROGRAM A| below
lrgfam       0.078 0.388 110053.281  0.534 1.137 DROP IN NEXT VERSION OF THE PROGRAM A| below
malesexag    0.095 0.075     51.759  0.786 1.100 DROP IN NEXT VERSION OF THE PROGRAM A| below
segadlboys   0.075 0.043    921.371  0.836 1.199 DROP IN NEXT VERSION OF THE PROGRAM A| below
agrlateboy  -0.462 3.577   1055.501  0.059 1.118 <--signif
> r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.1390154       0.9343954       0.9704000 
> ccc
                Fstat         df pvalue
RESET           0.101  12932.650  0.750
Wald on restrs. 0.147     63.163  0.703
NCV             0.832   5575.436  0.362
SWnormal        0.263   8594.677  0.608
lagll           1.627 200298.416  0.202
lagdd           3.422 207826.011  0.064

>

[edit] A| ALL THE NEW VARIABLES in xR with RAPE the Depvar

Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI"))
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED)

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9)
###ADDED, 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar ~ fyll + cultints + roots + fish + 
       exogamy + settype + femsubs
  +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data = m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| results for rape (for a limited set of indepvars only)

2SLS model for rape	
> bbb
              coef Fstat       ddf pvalue   VIF
(Intercept)  1.198 0.662  1156.531  0.416    NA
fyll        -0.174 0.044  1104.914  0.835 1.509
cultints     0.063 2.439 31954.471  0.118 1.804
roots       -0.052 0.104  4688.005  0.748 1.602
fish        -0.012 0.121  4407.942  0.728 1.490
exogamy     -0.047 0.752 52789.765  0.386 1.543
settype     -0.011 0.155 31066.044  0.693 1.862
femsubs      0.009 0.041  2480.530  0.840 1.715
nonmatrel    0.040 0.268   418.437  0.605 1.212
lrgfam       0.007 0.196  7655.924  0.658 1.165
malesexag    0.058 1.782    75.547  0.186 1.193
segadlboys  -0.009 0.054  1507.425  0.816 1.282
agrlateboy   0.042 1.767   610.245  0.184 1.186
> r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.1247379       0.9211960       0.9413928 
> ccc
                 Fstat         df pvalue
RESET            0.085   5105.658  0.770
Wald on restrs. -0.161    128.646  1.000
NCV              0.256   1155.795  0.613
SWnormal        20.914    493.655  0.000
lagll            0.603  82601.983  0.437
lagdd            0.218 109176.578  0.641

[edit] A| GOOD START HERE FOR OTHER PROJECTS Expanded xR to full xUR- ALL indvars w/ depvar="rape"

Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227, ### famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","settype", #1# "famsize", 
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+ ### famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+  ###famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy
, data = m9) ###ADDED 
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich","localjh", #1# "famsize", 
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| Good xUR Unrestricted Results for rape

Note: Doug 12:57, 17 October 2009 (PDT) fixed remaining problems which were with dateobs (for most of our EduMod pages this is NOT a problem), see the code above. Doug 11:54, 21 October 2009 (PDT) I put a leading blank into these lines so they would appear correctly in this wiki module. These are now the right results for an xUR Unrestricted model where you can see the significance of ALL the variables. (Tho none happend to be significant).

>  bbb
              2SLS model for rape
              coef Fstat       ddf pvalue    VIF
(Intercept)  2.562 0.336   167.139  0.563     NA
fyll        -0.779 0.288  1083.896  0.592  4.020
fydd         0.448 0.287  2671.850  0.592  3.012
dateobs      0.000 0.043   128.399  0.836  3.595
cultints     0.085 0.727   966.792  0.394  9.227
roots        0.220 0.259   435.561  0.611  9.420
cereals      0.331 0.587   477.379  0.444 12.597
gath         0.012 0.025  1614.886  0.874  4.748
plow         0.217 0.304   814.607  0.581  4.793
hunt         0.063 0.545  4253.794  0.460  7.640
fish        -0.010 0.018   440.129  0.893  5.969
anim         0.031 0.137  2549.142  0.711 10.677
pigs        -0.153 0.305  1748.048  0.581  3.814
milk         0.103 0.086  3238.293  0.769  6.897
bovines     -0.201 0.388  3475.813  0.533  5.909
tree         0.444 0.707   347.629  0.401  6.841
foodtrade    0.005 0.447  3496.470  0.504  1.789
foodscarc    0.016 0.067   338.090  0.796  1.687
ecorich     -0.029 0.120   317.220  0.729  3.458
popdens     -0.015 0.034 11862.371  0.853  5.462
pathstress  -0.026 0.788 11684.485  0.375  3.863
exogamy     -0.015 0.043   919.170  0.836  2.224
ncmallow    -0.025 0.510   294.359  0.476  2.136
famsize     -0.068 0.033  2624.650  0.856 68.830
settype      0.034 0.257   489.409  0.613  8.019
localjh     -0.022 0.016   649.662  0.900  3.094
superjh     -0.155 2.114  1105.436  0.146  4.040
moralgods   -0.055 0.392   541.527  0.532  2.699
fempower     0.005 0.010   542.865  0.919  2.131
femsubs      0.013 0.036  7385.257  0.850  3.261
sexratio    -0.107 0.431    53.240  0.514  2.186
war          0.009 0.397   877.498  0.529  2.897
himilexp    -0.080 0.190   419.162  0.663  2.428
money        0.058 0.704  1884.435  0.402  2.744
wagelabor    0.036 0.078    31.348  0.782  2.120
migr         0.052 0.054    63.238  0.816  2.556
brideprice  -0.082 0.140   390.846  0.708  3.323
nuclearfam  -0.258 1.057   379.763  0.305  3.974
pctFemPolyg  0.003 0.777   229.551  0.379  2.663
nonmatrel    0.046 0.207  1436.922  0.649  1.961
lrgfam       0.003 0.000  1891.865  0.983 66.072
malesexag    0.083 1.887    66.098  0.174  1.922
segadlboys   0.001 0.000   116.689  0.992  2.308
agrlateboy  -0.001 0.001   724.970  0.977  2.625
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.3827469       0.9280168       0.9421643 
>  ccc
                 Fstat          df pvalue
RESET            1.817    1838.106  0.178
Wald on restrs. -0.002  116384.622  1.000
NCV              0.622    3242.557  0.430
SWnormal         2.379     141.466  0.125
lagll            0.956 1332831.121  0.328
lagdd            1.200   24711.297  0.273

[edit] B| Second run by DRW same results

>  bbb
              coef Fstat       ddf pvalue    VIF
(Intercept)  2.919 0.484   638.171  0.487     NA
fyll        -0.847 0.371  1345.189  0.543  3.694
fydd         0.398 0.223  1174.360  0.637  2.974
dateobs      0.000 0.065   898.498  0.799  3.453
cultints     0.083 0.683   849.922  0.409  9.211
roots        0.310 0.556  1529.567  0.456  9.206
cereals      0.385 0.833   816.689  0.362 12.293
gath         0.024 0.101  3185.971  0.750  4.731
plow         0.186 0.237  1992.792  0.626  4.689
hunt         0.071 0.674  1129.967  0.412  7.560
fish        -0.005 0.005   952.654  0.946  6.021
anim         0.062 0.577  2454.294  0.448 10.027
pigs        -0.182 0.431  2544.667  0.511  3.848
milk         0.055 0.025  1723.432  0.874  6.664
bovines     -0.210 0.391   706.323  0.532  6.006
tree         0.530 1.135  2184.635  0.287  6.627
foodtrade    0.005 0.431   828.821  0.512  1.724
foodscarc    0.025 0.160   397.205  0.689  1.668
ecorich     -0.018 0.053  8887.604  0.818  3.393
popdens     -0.021 0.066  1554.604  0.797  5.092
pathstress  -0.032 1.148 10413.790  0.284  3.893
exogamy     -0.032 0.205  1949.616  0.651  2.159
ncmallow    -0.028 0.628   557.319  0.428  2.153
settype      0.037 0.352  3992.402  0.553  7.697
localjh     -0.050 0.091  3871.302  0.763  2.900
superjh     -0.172 2.644  1377.984  0.104  3.983
moralgods   -0.047 0.290   531.211  0.590  2.639
fempower     0.011 0.057   457.742  0.812  1.982
femsubs      0.015 0.050 34109.757  0.823  3.255
sexratio    -0.126 0.554    34.995  0.462  2.040
war          0.010 0.522   275.716  0.471  2.978
himilexp    -0.092 0.254  1431.797  0.614  2.513
money        0.074 1.144  2203.151  0.285  2.719
wagelabor    0.045 0.216   281.808  0.642  2.063
migr         0.019 0.011   192.421  0.917  2.192
brideprice  -0.098 0.220  1701.619  0.639  3.252
nuclearfam  -0.299 1.547  1047.539  0.214  3.878
pctFemPolyg  0.003 0.632   295.637  0.427  2.676
nonmatrel    0.021 0.040   337.474  0.841  1.971
lrgfam      -0.021 0.624  5254.490  0.430  2.886
malesexag    0.064 1.251   192.030  0.265  2.087
segadlboys   0.010 0.030  6930.363  0.862  2.218
agrlateboy  -0.008 0.028   788.140  0.868  2.667
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.3691837       0.9304507       0.9388959 
>  ccc
                 Fstat        df pvalue
RESET            1.938   170.002  0.166
Wald on restrs. -0.111   289.179  1.000
NCV              0.543   297.300  0.462
SWnormal         2.870   544.440  0.091
lagll            0.862 13884.498  0.353
lagdd            1.159 40410.388  0.282

[edit] B| Now list the Significant <.05 and Nonsignificant variables

>  bbb
              2SLS model for rape
              coef Fstat       ddf pvalue    VIF
(Intercept)  2.562 0.336   167.139  0.563     NA
fyll        -0.779 0.288  1083.896  0.592  4.020
fydd         0.448 0.287  2671.850  0.592  3.012
dateobs      0.000 0.043   128.399  0.836  3.595
gath         0.012 0.025  1614.886  0.874  4.748
plow         0.217 0.304   814.607  0.581  4.793
hunt         0.063 0.545  4253.794  0.460  7.640
fish        -0.010 0.018   440.129  0.893  5.969
pigs        -0.153 0.305  1748.048  0.581  3.814
foodtrade    0.005 0.447  3496.470  0.504  1.789
foodscarc    0.016 0.067   338.090  0.796  1.687
ecorich     -0.029 0.120   317.220  0.729  3.458
popdens     -0.015 0.034 11862.371  0.853  5.462
pathstress  -0.026 0.788 11684.485  0.375  3.863
exogamy     -0.015 0.043   919.170  0.836  2.224
ncmallow    -0.025 0.510   294.359  0.476  2.136
localjh     -0.022 0.016   649.662  0.900  3.094
superjh     -0.155 2.114  1105.436  0.146  4.040
moralgods   -0.055 0.392   541.527  0.532  2.699
fempower     0.005 0.010   542.865  0.919  2.131
femsubs      0.013 0.036  7385.257  0.850  3.261
sexratio    -0.107 0.431    53.240  0.514  2.186
war          0.009 0.397   877.498  0.529  2.897
himilexp    -0.080 0.190   419.162  0.663  2.428
money        0.058 0.704  1884.435  0.402  2.744
wagelabor    0.036 0.078    31.348  0.782  2.120
migr         0.052 0.054    63.238  0.816  2.556
brideprice  -0.082 0.140   390.846  0.708  3.323
nuclearfam  -0.258 1.057   379.763  0.305  3.974
pctFemPolyg  0.003 0.777   229.551  0.379  2.663
nonmatrel    0.046 0.207  1436.922  0.649  1.961
malesexag    0.083 1.887    66.098  0.174  1.922
segadlboys   0.001 0.000   116.689  0.992  2.308
agrlateboy  -0.001 0.001   724.970  0.977  2.625

None significant but drop these as absurdly high VIF 
cultints     0.085 0.727   966.792  0.394  9.227
roots        0.220 0.259   435.561  0.611  9.420
milk         0.103 0.086  3238.293  0.769  6.897
bovines     -0.201 0.388  3475.813  0.533  5.909
tree         0.444 0.707   347.629  0.401  6.841
cereals      0.331 0.587   477.379  0.444 12.597
anim         0.031 0.137  2549.142  0.711 10.677
famsize     -0.068 0.033  2624.650  0.856 68.830
lrgfam       0.003 0.000  1891.865  0.983 66.072
settype      0.034 0.257   489.409  0.613  8.019

[edit] A| Shrink xR to drop high VIF for "rape" as depvar

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+dateobs+
cereals+gath+plow+                            ###cultints+roots+
hunt+fish+pigs+foodtrade+foodscarc+           ###anim+tree+milk+bovines+
ecorich+popdens+pathstress+exogamy+ncmallow+ ###famsize+
localjh+superjh+moralgods+fempower+femsubs+   ###settype+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+malesexag+segadlboys+agrlateboy    ###lrgfam+
, data = m9)


Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI"))
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+dateobs+
cereals+gath+plow+                            ###cultints+roots+
hunt+fish+pigs+foodtrade+foodscarc+           ###anim+tree+milk+bovines+
ecorich+popdens+pathstress+exogamy+ncmallow+ ###famsize+
localjh+superjh+moralgods+fempower+femsubs+   ###settype+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
+nonmatrel+malesexag+segadlboys+agrlateboy    ###lrgfam+
, data = m9) ###ADDED 
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| No mistake, oddly enough: None of your independent variables are significant !!!

>  bbb
              coef Fstat       ddf pvalue   VIF
(Intercept)  3.979 1.075   185.744  0.301    NA
fyll        -0.916 0.529  2906.352  0.467 3.411
fydd         0.226 0.122  6500.093  0.726 1.989
dateobs     -0.001 0.160   168.144  0.690 3.040
cereals      0.177 1.066 13896.531  0.302 2.401
gath        -0.020 0.134  1122.820  0.714 2.610
plow         0.149 0.246   532.379  0.620 2.999
hunt        -0.016 0.095  2466.479  0.758 3.231
fish        -0.040 0.514   378.080  0.474 3.599
pigs        -0.117 0.261   940.034  0.609 2.781
foodtrade    0.005 0.578  4149.915  0.447 1.572
foodscarc    0.010 0.030   262.167  0.862 1.500
ecorich     -0.040 0.300   320.110  0.584 2.814
popdens      0.028 0.167 42490.153  0.683 4.190
pathstress  -0.016 0.396 14100.168  0.529 3.049
exogamy     -0.042 0.448  1915.890  0.503 1.883
ncmallow    -0.020 0.434   263.560  0.511 1.729
localjh     -0.044 0.085   926.579  0.771 2.590
superjh     -0.141 2.171   586.790  0.141 3.449
moralgods   -0.066 0.717   683.850  0.397 2.306
fempower    -0.020 0.331   576.806  0.565 1.389
femsubs      0.025 0.190 10023.292  0.663 2.554
sexratio    -0.069 0.221    43.120  0.641 1.762
war          0.003 0.047   824.329  0.828 2.627
himilexp     0.034 0.050   986.465  0.823 1.869
money        0.039 0.366   776.947  0.546 2.516
wagelabor   -0.004 0.001    36.801  0.971 1.742
migr         0.128 0.544   212.710  0.462 2.084
brideprice  -0.056 0.085   606.510  0.770 2.905
nuclearfam  -0.169 0.760   775.476  0.383 2.740
pctFemPolyg  0.003 0.965   364.196  0.327 2.210
nonmatrel    0.063 0.567  7213.493  0.451 1.534
malesexag    0.078 1.772    39.797  0.191 1.675
segadlboys  -0.005 0.007   168.421  0.932 1.934
agrlateboy   0.016 0.125   558.193  0.724 2.281
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.3363051       0.9280168       0.9421643 
>  ccc
                 Fstat          df pvalue
RESET            0.850     970.108  0.357
Wald on restrs. -0.002  116384.622  1.000
NCV              0.114    1564.018  0.736
SWnormal         3.470      96.948  0.066
lagll            0.831 1726488.173  0.362
lagdd            0.841   23156.454  0.359
> 

[edit] A! So drop everything but your malesexag and superjh, lowest p<.20

Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+ ###
superjh+                    ###
malesexag+agrlateboy        ###
, data = m9) 
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B| truly no results --> you need some new independent variables

But now your xR<- need only add the variables you will add now, all others non-significant ! One strategy now is to use Spss and do correlation matrices with the v667 rape variable.


             2SLS model for rape
>  bbb
              coef Fstat       ddf pvalue   VIF
(Intercept)  1.439 1.587 17872.461  0.208    NA
fyll        -0.370 0.239 10675.774  0.625 1.334
fydd         0.194 0.148  3286.862  0.701 1.266
superjh     -0.029 0.319 13123.134  0.572 1.168
malesexag    0.045 1.028    47.386  0.316 1.056 <-- this was significant earlier but not now
agrlateboy   0.043 2.116   361.324  0.147 1.059 <-- your "best" variable is not significant
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
     0.06647616      0.92801682      0.94216431 <-- no significant r2
>  ccc                                          DONT LOSE SIGHT OF THE POSITIVE FACT THAT 
                                                YOU HAVE *ELIMINATED* ALOT OF indep VARIABLES
                 Fstat         df pvalue
RESET           -0.015     73.816  1.000
Wald on restrs. -0.002 116384.622  1.000
NCV              0.128    273.385  0.721
SWnormal        27.846    320.802  0.000 <-- 
lagll            0.532 835977.478  0.466
lagdd            0.589 310099.987  0.443

>

[edit] A| Amanda adding new Peggy Sanday (her author) codes

Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300,postsextab=SCCS$v240,
postsextab2=SCCS$v34,mavofemsex=SCCS$v672,wivhotgr=SCCS$v668,
freintovio=SCCS$v666,ideomaltough=SCCS$v664,maleseg=SCCS$v665,
war=SCCS$v679,   ###maleagr=SCCS$v669,fempolpar=SCCS$v661,femsolgro=SCCS$v662,
fempow=SCCS$v663) ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy","postsextab",
"postsextab2","mavofemsex","wivhotgr",
"freintovio","ideomaltough","maleseg",
"war",   ###"maleagr",""fempolpar","femsolgro",
"fempow") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy+postsextab+
postsextab2+mavofemsex+wivhotgr+
freintovio+ideomaltough+maleseg+
war+   ###maleagr+fempolpar+femsolgro+
fempow,data=m9) ###ADDED 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+ ###
superjh+                    ###
malesexag+agrlateboy        ###
, data = m9) 
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] B|Results w/out Sanday variables

>  bbb
              coef Fstat       ddf pvalue   VIF
(Intercept)  2.021 3.082  5290.986  0.079    NA
fyll        -0.486 0.425  8862.637  0.515 1.319
fydd        -0.013 0.001 18969.941  0.978 1.243
superjh     -0.034 0.418  9800.986  0.518 1.208
malesexag    0.037 0.961   248.045  0.328 1.091
agrlateboy   0.035 1.405  1375.753  0.236 1.067
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
     0.05755384      0.96579726      0.97365918 
>  ccc
                 Fstat         df pvalue
RESET            0.437    826.400  0.509
Wald on restrs. -0.328     17.269  1.000
NCV              0.271   3109.778  0.603
SWnormal        34.258   1787.713  0.000
lagll            0.188 144138.539  0.664
lagdd            0.165 166556.689  0.684
> 

[edit] A|PLEASE WORK, PROGRAM!!!!!!!!!!

  • Everytime you start a new program, please say exactly where it is from.
  • Should have started from ==15 A! So Drop...==

Program1 works but there is an error in Program 2 and bbb doesnt print. The error is: Error in shapiro.test(e) : all 'x' values are identical In addition: Warning messages:

1: In lm.LMtests(xR, wmatll, test = c("LMlag")) :
 Spatial weights matrix not row standardized
2: In lm.LMtests(xR, wmatdd, test = c("LMlag")) :
 Spatial weights matrix not row standardized
3: In lm.LMtests(xR, wmatll, test = c("LMlag")) :
 Spatial weights matrix not row standardized
4: In lm.LMtests(xR, wmatdd, test = c("LMlag")) :
 Spatial weights matrix not row standardized

So probably the addition of new independent variables had and error. DRW will copy into a new ==A| ...== and try to find the error.

 Program 1 --> Program 2
 #MI--make the imputed datasets
 #--change the following path to the directory with your data and program--
 setwd("C:/My Documents/MI")
 rm(list=ls(all=TRUE))
 options(echo=TRUE)
 #--you need the following two packages--you must install them first--
 library(foreign)
 library(mice)
 library(tripak)
 library(zoo)
 library(sp)
 library(maptools)
 library(spam)
 
 #--To find the citation for a package, use this function:---
 citation("mice")
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in auxiliary variables---
 load("vaux.Rdata",.GlobalEnv)
 row.names(vaux)<-NULL
 #--Read in the SCCS dataset---
 load("SCCS.Rdata",.GlobalEnv)
 
 #--look at first 6 rows of vaux--
 head(vaux)
 #--look at field names of vaux--
 names(vaux)
 #--check to see that rows are properly aligned in the two datasets--
 #--sum should equal 186---
 sum((SCCS$socname==vaux$socname)*1)
 #--remove the society name field--
 vaux<-vaux[,-28]
 names(vaux)
 
 #--Two nominal variables: brg and rlg----
 #--brg: consolidated Burton  Regions-----
 #0 = (rest of world) circumpolar, South and Meso-America, west North America
 #1 = Subsaharan Africa
 #2 = Middle Old World
 #3 = Southeast Asia, Insular Pacific, Sahul
 #4 = Eastern Americas
 #--rlg: Religion---
 #'0 (no world religion)'  
 #'1 (Christianity)'  
 #'2 (Islam)'  
 #'3 (Hindu/Buddhist)'  
 
 #--check to see number of missing values in vaux, 
 #--whether variables are numeric,
 #--and number of discrete values for each variable---
 vvn<-names(vaux)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(vaux[,vvn[i]])))
 numeric<-is.numeric(vaux[,vvn[i]])
 numDiscrVals<-length(table(vaux[,vvn[i]]))
 pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
 }
 row.names(pp)<-vvn
 pp
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--extract variables to be used from SCCS, put in dataframe fx--
 fx<-data.frame(
 socname=SCCS$socname,socID=SCCS$"sccs#",
 valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
 dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
 cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
 fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
 pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
 bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
 foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
 ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
 CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
 AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
 exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
 settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
 moralgods=SCCS$v238,fempower=SCCS$v663,
 sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
 war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
 money=SCCS$v155,wagelabor=SCCS$v1732,
 migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
 nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
 nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,          ###ADDED
 segadlboys=SCCS$v242,agrlateboy=SCCS$v300,postsextab=SCCS$v240,  ###ADDED
 postsextab2=SCCS$v34,mavofemsex=SCCS$v672,wivhotgr=SCCS$v668,    ###ADDED
 freintovio=SCCS$v666,ideomaltough=SCCS$v664,maleseg=SCCS$v665,   ###ADDED
 war=SCCS$v679,   ###maleagr=SCCS$v669,fempolpar=SCCS$v661,femsolgro=SCCS$v662,  ###IN v663
 fempow=SCCS$v663) ###ADDED
 
 #--look at first 6 rows of fx--
 head(fx)
 
 #--check to see number of missing values--
 #--also check whether numeric--
 vvn<-names(fx)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(fx[,vvn[i]])))
 numeric<-is.numeric(fx[,vvn[i]])
 pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
 }
 row.names(pp)<-vvn
 pp
 
 #--identify variables with missing values--
 z<-which(pp[,1]>0)
 zv1<-vvn[z]
 zv1
 #--identify variables with non-missing values--
 z<-which(pp[,1]==0)
 zv2<-vvn[z]
 zv2
 
 #-----------------------------
 #----Multiple imputation------
 #-----------------------------
 
 #--number of imputed data sets to create--
 nimp<-10
 #--one at a time, loop through those variables with missing values--
 for (i in 1:length(zv1)){
 #--attach the imputand to the auxiliary data--
 zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
 #--in the following line, the imputation is done--
 aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
 #--during first iteration of the loop, create dataframe impdat--
 if (i==1){
 impdat<-data.frame(aqq[,c(".id",".imp")])
 }
 #--the imputand is placed as a field in impdat and named--
 impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
 names(impdat)[NCOL(impdat)]<-zv1[i]
 }
 
 #--now the non-missing variables are attached to impdat--
 gg<-NULL
 for (i in 1:nimp){
 gg<-rbind(gg,data.frame(fx[,zv2]))
 }
 impdat<-cbind(impdat,gg)
 
 #--take a look at the top 6 and bottom 6 rows of impdat--
 head(impdat)
 tail(impdat)
 
 #--impdat is saved as an R-format data file--
 save(impdat,file="impdat.Rdata")
 
 
  
 Program 2
 #MI--estimate model with network-lagged dependent variables, combine results
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #--can replace "sum" with "max"
 ###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
 ###depvar<-SCCS$v667###NOT THAT BUT THIS
 depvar<-SCCS$v664###NEW
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 ###depvarname<-"childvar"
 depvarname<-"rape"  ###667. 
 #--can add additional SCCS variable, but only if it has no missing values---
 #dateobs<-SCCS$v838
 #dateobs<-dateobs[zdv]
 
 #--look at frequencies and quartiles for the dep. varb.--
 summary(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
 "fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
 "localjh","money","cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
 "ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
 "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",      
 "nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy","postsextab", ###ADDED
 "postsextab2","mavofemsex","wivhotgr",                                   ###ADDED
 "freintovio","ideomaltough","maleseg",                                   ###ADDED
 "war")  ###"maleagr","fempolpar","femsolgro")                                 ###ADDED
 ###"fempow") ###ADDED
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 nimp<-10
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--create spatially lagged dep. varbs. in stage 1 OLS--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--Stage 2 OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd+dateobs+            
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor+
 migr+brideprice+nuclearfam+pctFemPolyg+
 nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy+postsextab+
 postsextab2+mavofemsex+wivhotgr+
 freintovio+ideomaltough+maleseg+
 war+   ###maleagr+fempolpar+femsolgro+
 fempow,data=m9) ###ADDED 
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--Stage 2 OLS estimate of restricted model--
 ###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
 ###    exogamy + settype + femsubs, data = m9)
    xR<-lm(depvar~fyll+fydd+dateobs+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor+
 migr+brideprice+nuclearfam+pctFemPolyg
 +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy+postsextab+
 postsextab2+mavofemsex+wivhotgr+
 freintovio+ideomaltough+maleseg+
 war+   ###maleagr+fempolpar+femsolgro+
 fempow, data = m9) ###ADDED 
 
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--collect some model diagnostics--
 dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
 "pigs","milk","bovines","foodscarc","ecorich",
 "popdens","pathstress","ncmallow","famsize","localjh",
 "superjh","moralgods","fempower","sexratio","money",
 "fydd","wagelabor","war","himilexp","tree","foodtrade")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 
 bbb
 r2
 ccc
 
 #Corrected to publication version with depvarname
 #--write results to csv file for perusal in spreadsheet--
 write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] A| DRW Attempt to fix

You started from full xUR not restricted xR so I stripped out extra XR<- elements

 Program 1 --> Program 2
 #MI--make the imputed datasets
 #--change the following path to the directory with your data and program--
 setwd("C:/My Documents/MI")
 rm(list=ls(all=TRUE))
 options(echo=TRUE)
 #--you need the following two packages--you must install them first--
 library(foreign)
 library(mice)
 library(tripak)
 library(zoo)
 library(sp)
 library(maptools)
 library(spam)
 
 #--To find the citation for a package, use this function:---
 citation("mice")
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in auxiliary variables---
 load("vaux.Rdata",.GlobalEnv)
 row.names(vaux)<-NULL
 #--Read in the SCCS dataset---
 load("SCCS.Rdata",.GlobalEnv)
 
 #--look at first 6 rows of vaux--
 head(vaux)
 #--look at field names of vaux--
 names(vaux)
 #--check to see that rows are properly aligned in the two datasets--
 #--sum should equal 186---
 sum((SCCS$socname==vaux$socname)*1)
 #--remove the society name field--
 vaux<-vaux[,-28]
 names(vaux)
 
 #--Two nominal variables: brg and rlg----
 #--brg: consolidated Burton  Regions-----
 #0 = (rest of world) circumpolar, South and Meso-America, west North America
 #1 = Subsaharan Africa
 #2 = Middle Old World
 #3 = Southeast Asia, Insular Pacific, Sahul
 #4 = Eastern Americas
 #--rlg: Religion---
 #'0 (no world religion)'  
 #'1 (Christianity)'  
 #'2 (Islam)'  
 #'3 (Hindu/Buddhist)'  
 
 #--check to see number of missing values in vaux, 
 #--whether variables are numeric,
 #--and number of discrete values for each variable---
 vvn<-names(vaux)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(vaux[,vvn[i]])))
 numeric<-is.numeric(vaux[,vvn[i]])
 numDiscrVals<-length(table(vaux[,vvn[i]]))
 pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
 }
 row.names(pp)<-vvn
 pp
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--extract variables to be used from SCCS, put in dataframe fx--
 fx<-data.frame(
 socname=SCCS$socname,socID=SCCS$"sccs#",
 valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
 dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
 cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
 fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
 pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
 bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
 foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
 ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
 CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
 AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
 exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
 settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
 moralgods=SCCS$v238,fempower=SCCS$v663,
 sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
 war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
 money=SCCS$v155,wagelabor=SCCS$v1732,
 migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
 nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
 nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
 segadlboys=SCCS$v242,agrlateboy=SCCS$v300,postsextab=SCCS$v240,
 postsextab2=SCCS$v34,mavofemsex=SCCS$v672,wivhotgr=SCCS$v668,
 freintovio=SCCS$v666,ideomaltough=SCCS$v664,maleseg=SCCS$v665,
 war=SCCS$v679) ###maleagr=SCCS$v669,,fempolpar=SCCS$v661,femsolgro=SCCS$v662) ###ADDED 
 ###fempow=SCCS$v663) ###ADDED
 
 #--look at first 6 rows of fx--
 head(fx)
 
 #--check to see number of missing values--
 #--also check whether numeric--
 vvn<-names(fx)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(fx[,vvn[i]])))
 numeric<-is.numeric(fx[,vvn[i]])
 pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
 }
 row.names(pp)<-vvn
 pp
 
 #--identify variables with missing values--
 z<-which(pp[,1]>0)
 zv1<-vvn[z]
 zv1
 #--identify variables with non-missing values--
 z<-which(pp[,1]==0)
 zv2<-vvn[z]
 zv2
 
 #-----------------------------
 #----Multiple imputation------
 #-----------------------------
 
 #--number of imputed data sets to create--
 nimp<-10
 #--one at a time, loop through those variables with missing values--
 for (i in 1:length(zv1)){
 #--attach the imputand to the auxiliary data--
 zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
 #--in the following line, the imputation is done--
 aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
 #--during first iteration of the loop, create dataframe impdat--
 if (i==1){
 impdat<-data.frame(aqq[,c(".id",".imp")])
 }
 #--the imputand is placed as a field in impdat and named--
 impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
 names(impdat)[NCOL(impdat)]<-zv1[i]
 }
 
 #--now the non-missing variables are attached to impdat--
 gg<-NULL
 for (i in 1:nimp){
 gg<-rbind(gg,data.frame(fx[,zv2]))
 }
 impdat<-cbind(impdat,gg)
 
 #--take a look at the top 6 and bottom 6 rows of impdat--
 head(impdat)
 tail(impdat)
 
 #--impdat is saved as an R-format data file--
 save(impdat,file="impdat.Rdata")
 
 
  
 Program 2
 #MI--estimate model with network-lagged dependent variables, combine results
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #--can replace "sum" with "max"
 ###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
 ###depvar<-SCCS$v667###NOT THAT BUT THIS
 depvar<-SCCS$v664###NEW
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 ###depvarname<-"childvar"
 depvarname<-"rape"  ###667. 
 #--can add additional SCCS variable, but only if it has no missing values---
 #dateobs<-SCCS$v838
 #dateobs<-dateobs[zdv]
 
 #--look at frequencies and quartiles for the dep. varb.--
 summary(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
 "fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
 "localjh","money","cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
 "ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
 "temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
 "nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy","postsextab",  ###ADDED
 "postsextab2","mavofemsex","wivhotgr",    ###ADDED
 "freintovio","ideomaltough","maleseg",    ###ADDED
 "war") ###"maleagr", ,"fempolpar","femsolgro")  ###ADDED
 ###"fempow") ###ADDED
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 nimp<-10
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--create spatially lagged dep. varbs. in stage 1 OLS--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--Stage 2 OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd+dateobs+            
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor+
 migr+brideprice+nuclearfam+pctFemPolyg+
 nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy+postsextab+
 postsextab2+mavofemsex+wivhotgr+
 freintovio+ideomaltough+maleseg+
 war)    ###maleagr++fempolpar+femsolgro) ###ADDED
 ###fempow,data=m9) ###ADDED 
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--Stage 2 OLS estimate of restricted model--
 ###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
 ###    exogamy + settype + femsubs, data = m9)
    xR<-lm(depvar~fyll+fydd
 +nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy+postsextab+
 postsextab2+mavofemsex+wivhotgr+
 freintovio+ideomaltough+maleseg+
 war)   ###maleagr++fempolpar+femsolgro
 , data = m9) ###ADDED 
 ###fempow, data = m9) ###ADDED 


 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--collect some model diagnostics--
 dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
 "pigs","milk","bovines","foodscarc","ecorich",
 "popdens","pathstress","ncmallow","famsize","localjh",
 "superjh","moralgods","fempower","sexratio","money",
 "fydd","wagelabor","war","himilexp","tree","foodtrade")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 
 bbb
 r2
 ccc
 
 #Corrected to publication version with depvarname
 #--write results to csv file for perusal in spreadsheet--
 write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] A! So drop everything but your malesexag and superjh, lowest p<.20

Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,
segadlboys=SCCS$v242,agrlateboy=SCCS$v300) ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
"nonmatrel","lrgfam","malesexag","segadlboys","agrlateboy") ###ADDED

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
nonmatrel+lrgfam+malesexag+segadlboys+agrlateboy,data=m9) ###ADDED 
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
   xR<-lm(depvar~fyll+fydd+ ###
superjh+                    ###
malesexag+agrlateboy        ###
, data = m9) 
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] A! So drop everything ... but add your variables

WAR WAS ENTERED TWICE - CORRECTED

+ postsextab2=SCCS$v34,mavofemsex=SCCS$v672,wivhotgr=SCCS$v668)    ###ADDED
Error: unexpected symbol in:
"segadlboys=SCCS$v242,agrlateboy=SCCS$v300                        ###ADDED
postsextab2"
+  p7<-cr2
+  dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
+  
+  }
Error in `[.data.frame`(m9, , indpv) : 
undefined columns selected
Error in as.matrix(m9[, indpv]) : 
 error in evaluating the argument 'x' in selecting a method for function 'as.matrix'
THIS ERROR COULD BE IN ONE OF THE NEW VARIABLE FIELDS, NOT IN THE Ramsey RESET
IT COULD ALSO BE IN x<- fydd or fyll
Program 1 --> Program 2
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")###)NO SECOND PARENS
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam)

#--To find the citation for a package, use this function:---
citation("mice")

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)

#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)

#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton  Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'  
#'1 (Christianity)'  
#'2 (Islam)'  
#'3 (Hindu/Buddhist)'  

#--check to see number of missing values in vaux, 
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
dateobs=SCCS$v838,cultints=SCCS$v232,roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,gath=SCCS$v203,hunt=SCCS$v204,
fish=SCCS$v205,anim=SCCS$v206,femsubs=SCCS$v890,
pigs=(SCCS$v244==2)*1,milk=(SCCS$v245>1)*1,plow=(SCCS$v243>1)*1,
bovines=(SCCS$v244==7)*1,tree=(SCCS$v233==4)*1,
foodtrade=SCCS$v819,foodscarc=SCCS$v1685,
ecorich=SCCS$v857,popdens=SCCS$v156,pathstress=SCCS$v1260,
CVrain=SCCS$v1914/SCCS$v1913,rain=SCCS$v854,temp=SCCS$v855,
AP1=SCCS$v921,AP2=SCCS$v928,ndrymonth=SCCS$v196,
exogamy=SCCS$v72,ncmallow=SCCS$v227,famsize=SCCS$v80,
settype=SCCS$v234,localjh=(SCCS$v236-1),superjh=SCCS$v237,
moralgods=SCCS$v238,fempower=SCCS$v663,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
war=SCCS$v1648,himilexp=(SCCS$v899==1)*1,
money=SCCS$v155,wagelabor=SCCS$v1732,
migr=(SCCS$v677==2)*1,brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,pctFemPolyg=SCCS$v872,
nonmatrel=SCCS$v52,lrgfam=SCCS$v68,malesexag=SCCS$v175,          ###ADDED
segadlboys=SCCS$v242,agrlateboy=SCCS$v300,                        ###ADDED
postsextab2=SCCS$v34,mavofemsex=SCCS$v672,wivhotgr=SCCS$v668)    ###ADDED

#--look at first 6 rows of fx--
head(fx)

#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp

#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2

#-----------------------------
#----Multiple imputation------
#-----------------------------

#--number of imputed data sets to create--
nimp<-10
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}

#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)

#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)

#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")


 
Program 2
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#HERE YOU CHANGE HOW THE DEPENDENT VARIABLE IS COMPUTED FOR A NEW PROJECT
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
depvar<-SCCS$v667###NEW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
###depvarname<-"childvar"
depvarname<-"rape"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade","dateobs",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg",
 "nonmatrel","lrgfam","malesexag",
 "segadlboys","agrlateboy","postsextab",        ###ADDED
 "postsextab2","mavofemsex","wivhotgr")         ###ADDED
 ###"war"- twice)  ###"maleagr","fempolpar","femsolgro")

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--create spatially lagged dep. varbs. in stage 1 OLS--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+            
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg+
 nonmatrel+lrgfam+malesexag+            ###ADDED
 segadlboys+agrlateboy+postsextab+      ###ADDED
 postsextab2+mavofemsex+wivhotgr       ###ADDED
,data=m9) 
 ###war   ###maleagr++fempolpar+femsolgro        ###DELETED
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--Stage 2 OLS estimate of restricted model--
###xR<-lm(depvar ~ fyll + cultints + roots + fish + 
###    exogamy + settype + femsubs, data = m9)
    xR<-lm(depvar~fyll+fydd
 +nonmatrel+lrgfam+malesexag+
 segadlboys+agrlateboy+postsextab+
 postsextab2+mavofemsex+wivhotgr   ###freintovio+ideomaltough+maleseg
, data = m9) ###ADDED 
 ###war   ###maleagr+fempolpar+femsolgro
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim","dateobs",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(e)$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] From here go to a debug EduMod-36 by DRW

EduMod-36: (your dep var here) Imputation and Regression