EduMod 2: 2-Estimate model, combine results

From InterSciWiki

Jump to: navigation, search

Contents

[edit] 1

EduMod-2: polygyny Imputation and Regression

[edit] 2

Deborah Blumenthal's problem Doug 07:33, 7 October 2009 (PDT) Will help you work on this

[edit] 3 Results

White and Burton 1988" including "marriage of captives, fraternal interest groups, and the plow" account for 42% of the variance in polygyny. The ccc statistics show that the model is underspecified so two of those variables could be added to this model. There are also residual errors that cluster by distance, another indicator that important variables are missing.

  bbb
             coef  Fstat       ddf pvalue   VIF
(Intercept) -0.733  0.100   480.283  0.752    NA
fyll        -0.214  0.143  3367.741  0.706 2.756
fydd         0.930 17.453  3306.424  0.000 3.097
dateobs      0.000  0.273 11750.596  0.601 1.304
cultints    -0.003  0.001 10729.263  0.981 5.075
roots        0.470  1.044  6407.076  0.307 5.009
cereals      0.318  0.504 11797.284  0.478 7.452
gath         0.040  0.170  7465.003  0.680 3.151
plow        -0.845  5.184  5121.842  0.023 2.970
hunt         0.207  3.868 15849.343  0.049 4.924
fish         0.042  0.289  5396.876  0.591 2.938
anim         0.185  4.210  5965.557  0.040 5.395
pigs        -0.631  3.430 37924.964  0.064 2.372
milk        -0.599  2.668 15472.662  0.102 4.279
bovines      0.354  1.003  3449.520  0.317 4.232
tree         0.309  0.290 40506.804  0.590 3.231
foodtrade   -0.016  2.914 47945.795  0.088 1.468
foodscarc   -0.002  0.001   260.688  0.978 1.281
ecorich     -0.046  0.274  5658.579  0.601 1.797
popdens      0.127  1.663 79272.755  0.197 3.575
pathstress   0.045  1.800  2911.158  0.180 2.325
exogamy      0.056  0.479 85943.144  0.489 1.412
ncmallow    -0.001  0.000  5744.644  0.987 1.438
famsize     -0.074  0.802 11691.405  0.370 1.560
settype      0.040  0.334  1752.016  0.563 4.011
localjh      0.149  0.726  7102.702  0.394 1.623
superjh     -0.112  1.071  1815.197  0.301 2.625
moralgods   -0.017  0.028   721.948  0.867 2.020
fempower    -0.090  2.357   169.216  0.127 1.367
femsubs      0.109  2.390 12394.014  0.122 1.694
sexratio    -0.031  0.016    22.175  0.900 1.339
war          0.033  5.330  1533.899  0.021 1.380
himilexp     0.367  2.814   241.040  0.095 1.470
money        0.065  0.653 34406.641  0.419 2.128
wagelabor    0.036  0.074    98.540  0.786 1.490
  r2
R2:final model R2:IV(distance) R2:IV(language) 
     0.4826739       0.9784302       0.9737484 
  ccc
             Fstat         df pvalue
RESET         3.884    865.652  0.049 signif specification error of regression
F on restrs.  5.845    697.947  0.016 some dropped variables coef. > 0
NCV          -0.022    479.126  1.000
SWnormal      1.917    643.745  0.167
lagll         1.782 470790.718  0.182
lagdd         5.174  51564.004  0.023 residual errors distance-autocorrelated


  bbb - reordered by type of effect
             coef  Fstat       ddf pvalue   VIF
(Intercept) -0.733  0.100   480.283  0.752    NA
fyll        -0.214  0.143  3367.741  0.706 2.756
fydd         0.930 17.453  3306.424  0.000 3.097 polygyny
hunt         0.207  3.868 15849.343  0.049 4.924 polygyny
anim         0.185  4.210  5965.557  0.040 5.395 polygyny war          0.033  5.330  1533.899  0.021 1.380 polygyny
himilexp     0.367  2.814   241.040  0.095 1.470 polygyny - low military expectations 899
femsubs      0.109  2.390 12394.014  0.122 1.694 polygyny
fempower    -0.090  2.357   169.216  0.127 1.367  monogamy
plow        -0.845  5.184  5121.842  0.023 2.970  monogamy
pigs        -0.631  3.430 37924.964  0.064 2.372  monogamy
milk        -0.599  2.668 15472.662  0.102 4.279  monogamy
foodtrade   -0.016  2.914 47945.795  0.088 1.468  monogamy
Editing EduMod 2: 2-Estimate model, combine results (section)
From InterSciWiki
Jump to: navigation, search

[edit] New Results from revised program 2 below)

> bbb
              coef  Fstat       ddf pvalue   VIF
(Intercept) -1.249  0.333   649.049  0.564    NA
fyll        -0.245  0.173  5724.290  0.677 2.767
fydd         0.959 17.330  9412.027  0.000 3.098
cultints     0.004  0.001  4112.076  0.971 5.050
roots        0.492  1.079 56298.457  0.299 4.968
cereals      0.344  0.551 29504.787  0.458 7.342
gath         0.048  0.227  9040.855  0.633 3.082
plow        -0.828  4.571  4807.793  0.033 2.961
hunt         0.208  3.537  5906.646  0.060 4.918
fish         0.047  0.350 20860.794  0.554 2.870
anim         0.188  4.000  5722.401  0.046 5.399
pigs        -0.652  3.292 10317.708  0.070 2.395
milk        -0.646  2.787  4578.696  0.095 4.295
bovines      0.361  0.970 17926.851  0.325 4.268
tree         0.354  0.354 32700.272  0.552 3.188
foodtrade   -0.016  2.467 57530.121  0.116 1.461
foodscarc   -0.007  0.005    59.198  0.942 1.259
ecorich     -0.038  0.168  8364.699  0.682 1.812
popdens      0.123  1.391  4208.317  0.238 3.519
pathstress   0.041  1.347  3524.415  0.246 2.326
exogamy      0.063  0.532  2666.218  0.466 1.407
ncmallow    -0.001  0.000   711.209  0.986 1.464
famsize     -0.068  0.616  3687.550  0.433 1.545
settype      0.035  0.237 13194.384  0.626 3.995
localjh      0.139  0.591 14075.452  0.442 1.614
superjh     -0.106  0.911  7536.585  0.340 2.636
moralgods   -0.010  0.009  4854.136  0.923 1.979
fempower    -0.104  2.821   111.975  0.096 1.328
femsubs      0.107  2.137  6270.723  0.144 1.677
sexratio    -0.008  0.002    53.246  0.969 1.377
war          0.030  4.096  1673.933  0.043 1.366
himilexp     0.425  3.368   215.052  0.068 1.496
money        0.070  0.657  2569.946  0.418 2.122
wagelabor    0.034  0.058   114.294  0.810 1.509
> r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.4320950       0.9782727       0.9735367 
> ccc
                Fstat         df pvalue
RESET           3.536   1296.272  0.060
Wald on restrs. 8.630    350.689  0.004
NCV             0.078   1270.227  0.780
SWnormal        1.146    311.476  0.285
lagll           1.741 197145.781  0.187
lagdd           5.489 143627.342  0.019
SORT
              coef  Fstat       ddf pvalue   VIF
(Intercept) -1.249  0.333   649.049  0.564    NA
fyll        -0.245  0.173  5724.290  0.677 2.767
cultints     0.004  0.001  4112.076  0.971 5.050
roots        0.492  1.079 56298.457  0.299 4.968
cereals      0.344  0.551 29504.787  0.458 7.342
gath         0.048  0.227  9040.855  0.633 3.082
fish         0.047  0.350 20860.794  0.554 2.870
bovines      0.361  0.970 17926.851  0.325 4.268
tree         0.354  0.354 32700.272  0.552 3.188
foodtrade   -0.016  2.467 57530.121  0.116 1.461
foodscarc   -0.007  0.005    59.198  0.942 1.259
ecorich     -0.038  0.168  8364.699  0.682 1.812
popdens      0.123  1.391  4208.317  0.238 3.519
pathstress   0.041  1.347  3524.415  0.246 2.326
exogamy      0.063  0.532  2666.218  0.466 1.407
ncmallow    -0.001  0.000   711.209  0.986 1.464
famsize     -0.068  0.616  3687.550  0.433 1.545
settype      0.035  0.237 13194.384  0.626 3.995
localjh      0.139  0.591 14075.452  0.442 1.614
superjh     -0.106  0.911  7536.585  0.340 2.636
moralgods   -0.010  0.009  4854.136  0.923 1.979
femsubs      0.107  2.137  6270.723  0.144 1.677
sexratio    -0.008  0.002    53.246  0.969 1.377
money        0.070  0.657  2569.946  0.418 2.122
wagelabor    0.034  0.058   114.294  0.810 1.509


fydd         0.959 17.330  9412.027  0.000 3.098
plow        -0.828  4.571  4807.793  0.033 2.961
hunt         0.208  3.537  5906.646  0.060 4.918
milk        -0.646  2.787  4578.696  0.095 4.295
anim         0.188  4.000  5722.401  0.046 5.399
pigs        -0.652  3.292 10317.708  0.070 2.395
fempower    -0.104  2.821   111.975  0.096 1.328
war          0.030  4.096  1673.933  0.043 1.366
himilexp     0.425  3.368   215.052  0.068 1.496

[edit] Program 2 (9-17-2009) obsolete

#September 17, 2009, Anthon Eff
#MI--estimate model, combine results

rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
#setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load(url("http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata"),.GlobalEnv)
#--Read in two weight matrices--
###ll<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/langwm.dta")[,-1])
###changed to 
ll<-as.matrix(read.dta("langwm.dta")[,-1])
###dd<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta")[,c(-1,-2,-189)])
###changed to 
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load(url("http://frank.mtsu.edu/~eaeff/downloads/impdat.Rdata"),.GlobalEnv)

#--create dep.varb. you wish to use from SCCS data--
###--Here we sum variables measuring how much a society values children--  
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
###DRW modified depvar
depvar<-SCCS$v860 ###DRW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]

#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))


indpv<-c("femsubs","foodscarc",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")


#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cy<-dd%*%y
o<-lm(cy~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xdy<-ll%*%xx   
cy<-ll%*%y
o<-lm(cy~xdy)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+  ###DRW added dateobs
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)

#--OLS estimate of restricted model--
###xR<-lm(depvar~fyll+fydd+settype+cultints+
###roots+foodtrade+exogamy+
###femsubs+fish, data=m9)
###DRW altered   ###DRW added dateobs
xR<-lm(depvar~fyll+fydd+dateobs+  
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*ov$sigma^2))
#--collect robust coef. variances when there is heteroskedasticity--
#ss<-rbind(ss,diag(hccm(xR)))
beta<-rbind(beta,coef(xR))

#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")

#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--F test that dropped variables had coefficient equal zero--
o<-linear.hypothesis(xUR,dropt,white.adjust=TRUE)$"Pr(>F)"[2]
p2<-qchisq(o,1,lower.tail=FALSE)
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-ov$r.squared
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}


#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","F on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] Program 2 (9-19-2009) obsolete

#September 19, 2009, Anthon Eff 
#MI--estimate model with network-lagged dependent variables, combine results

rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
###ll<-as.matrix(read.dta("langwm.dta")[,-1])
###changed to
ll<-as.matrix(read.dta("langwm.dta")[,-1])
###dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
###changed to
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#--create dep.varb. you wish to use from SCCS data--
###--Here we sum variables measuring how much a society values children-- 
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
###DRW modified depvar
depvar<-SCCS$v860  ###DRW
dateobs<-SCCS$v838 ###DRW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
dateobs<-dateobs[zdv]

#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))


indpv<-c("femsubs","foodscarc",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")


#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y     ###EAE changed
o<-lm(cyd~xdy)  ###EAE changed
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xdy<-ll%*%xx    ###EAE changed
cyl<-ll%*%y     ###EAE changed
o<-lm(cyl~xdy)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+  ###DRW added dateobs
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)

#--OLS estimate of restricted model--
###xR<-lm(depvar~fyll+fydd+settype+cultints+
###roots+foodtrade+exogamy+
###femsubs+fish, data=m9)
###DRW altered   ###DRW added dateobs
xR<-lm(depvar~fyll+fydd+dateobs+ 
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*ov$sigma^2))
#--collect robust coef. variances when there is heteroskedasticity--
#ss<-rbind(ss,diag(hccm(xR)))
beta<-rbind(beta,coef(xR))

#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")

#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--F test that dropped variables had coefficient equal zero--
o<-linear.hypothesis(xUR,dropt,white.adjust=TRUE)$"Pr(>F)"[2]
p2<-qchisq(o,1,lower.tail=FALSE)
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-ov$r.squared
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}


#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","F on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] Program 2 (9-22-2009) current

#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)

#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
childvalue<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) 
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(childvalue))
childvalue<-childvalue[zdv]

#--look at frequencies and quartiles for the dep. varb.--
summary(childvalue)
table(childvalue)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))

indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg")

#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#--create spatially lagged dep. varbs.--
y<-as.matrix(childvalue)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#--OLS estimate of unrestricted model--
xUR<-lm(childvalue~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
,data=m9)

#--OLS estimate of restricted model--
xR<-lm(childvalue ~ fyll + cultints + roots + fish + 
    exogamy + settype + femsubs, data = m9)

#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE)
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}

#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for childvalue==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)



[edit] Program 2 (9-22-2009) rev 2

#MI--estimate model, combine results

rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)

#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)

#-----------------------------
#--Read in data, rearrange----
#-----------------------------

#--Read in original SCCS data---
###load(url("http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata"),.GlobalEnv)
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
###ll<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/langwm.dta")[,-1])
ll<-as.matrix(read.dta("langwm.dta")[,-1])
###dd<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta")[,c(-1,-2,-189)])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
###load(url("http://frank.mtsu.edu/~eaeff/downloads/impdat.Rdata"),.GlobalEnv)
load("impdat.Rdata",.GlobalEnv)

#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v860       ###new polygyny variable
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
dateobs<-SCCS$v838      ###DRW NEW
dateobs<-dateobs[zdv]   ###DRW NEW
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)

#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))


indpv<-c("femsubs","foodscarc",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")


#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------

#--number of imputed datasets--
nimp<-10

#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL

#--loop through the imputed datasets--
for (i in 1:nimp){

#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]

#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx   
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)

#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+  ###had been extra + (++) here and below
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)

#--OLS estimate of restricted model--
### xR<-lm(depvar~fyll+fydd+settype+cultints+
### roots+foodtrade+exogamy+
### femsubs+fish, data=m9)
### DRW ADDED ALL THE VARIABLES BELOW TO REPLACE THOSE ABOVE: WHY SO RESTRICTED?

#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)


#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))

#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))

#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")


#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))

}


#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------

#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")

#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")

bbb
r2
ccc

#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)

[edit] Follow up

What your R results are still on the screen
Paste everything from xR to the end into the Polygyny follow-up and edit out the nonsignificant variables, then rerun while R is still open.
(Dont use this file but make a copy)

Deborah Blumenthal's problem Doug 07:33, 7 October 2009 (PDT) Will help you work on this