EduMod 2: 2-Estimate model, combine results
From InterSciWiki
Contents |
[edit] 1
EduMod-2: polygyny Imputation and Regression
[edit] 2
Deborah Blumenthal's problem Doug 07:33, 7 October 2009 (PDT) Will help you work on this
[edit] 3 Results
White and Burton 1988" including "marriage of captives, fraternal interest groups, and the plow" account for 42% of the variance in polygyny. The ccc statistics show that the model is underspecified so two of those variables could be added to this model. There are also residual errors that cluster by distance, another indicator that important variables are missing.
bbb
coef Fstat ddf pvalue VIF
(Intercept) -0.733 0.100 480.283 0.752 NA
fyll -0.214 0.143 3367.741 0.706 2.756
fydd 0.930 17.453 3306.424 0.000 3.097
dateobs 0.000 0.273 11750.596 0.601 1.304
cultints -0.003 0.001 10729.263 0.981 5.075
roots 0.470 1.044 6407.076 0.307 5.009
cereals 0.318 0.504 11797.284 0.478 7.452
gath 0.040 0.170 7465.003 0.680 3.151
plow -0.845 5.184 5121.842 0.023 2.970
hunt 0.207 3.868 15849.343 0.049 4.924
fish 0.042 0.289 5396.876 0.591 2.938
anim 0.185 4.210 5965.557 0.040 5.395
pigs -0.631 3.430 37924.964 0.064 2.372
milk -0.599 2.668 15472.662 0.102 4.279
bovines 0.354 1.003 3449.520 0.317 4.232
tree 0.309 0.290 40506.804 0.590 3.231
foodtrade -0.016 2.914 47945.795 0.088 1.468
foodscarc -0.002 0.001 260.688 0.978 1.281
ecorich -0.046 0.274 5658.579 0.601 1.797
popdens 0.127 1.663 79272.755 0.197 3.575
pathstress 0.045 1.800 2911.158 0.180 2.325
exogamy 0.056 0.479 85943.144 0.489 1.412
ncmallow -0.001 0.000 5744.644 0.987 1.438
famsize -0.074 0.802 11691.405 0.370 1.560
settype 0.040 0.334 1752.016 0.563 4.011
localjh 0.149 0.726 7102.702 0.394 1.623
superjh -0.112 1.071 1815.197 0.301 2.625
moralgods -0.017 0.028 721.948 0.867 2.020
fempower -0.090 2.357 169.216 0.127 1.367
femsubs 0.109 2.390 12394.014 0.122 1.694
sexratio -0.031 0.016 22.175 0.900 1.339
war 0.033 5.330 1533.899 0.021 1.380
himilexp 0.367 2.814 241.040 0.095 1.470
money 0.065 0.653 34406.641 0.419 2.128
wagelabor 0.036 0.074 98.540 0.786 1.490
r2
R2:final model R2:IV(distance) R2:IV(language)
0.4826739 0.9784302 0.9737484
ccc
Fstat df pvalue
RESET 3.884 865.652 0.049 signif specification error of regression
F on restrs. 5.845 697.947 0.016 some dropped variables coef. > 0
NCV -0.022 479.126 1.000
SWnormal 1.917 643.745 0.167
lagll 1.782 470790.718 0.182
lagdd 5.174 51564.004 0.023 residual errors distance-autocorrelated
bbb - reordered by type of effect
coef Fstat ddf pvalue VIF
(Intercept) -0.733 0.100 480.283 0.752 NA
fyll -0.214 0.143 3367.741 0.706 2.756
fydd 0.930 17.453 3306.424 0.000 3.097 polygyny
hunt 0.207 3.868 15849.343 0.049 4.924 polygyny
anim 0.185 4.210 5965.557 0.040 5.395 polygyny war 0.033 5.330 1533.899 0.021 1.380 polygyny
himilexp 0.367 2.814 241.040 0.095 1.470 polygyny - low military expectations 899
femsubs 0.109 2.390 12394.014 0.122 1.694 polygyny
fempower -0.090 2.357 169.216 0.127 1.367 monogamy
plow -0.845 5.184 5121.842 0.023 2.970 monogamy
pigs -0.631 3.430 37924.964 0.064 2.372 monogamy
milk -0.599 2.668 15472.662 0.102 4.279 monogamy
foodtrade -0.016 2.914 47945.795 0.088 1.468 monogamy
Editing EduMod 2: 2-Estimate model, combine results (section) From InterSciWiki Jump to: navigation, search
[edit] New Results from revised program 2 below)
> bbb
coef Fstat ddf pvalue VIF
(Intercept) -1.249 0.333 649.049 0.564 NA
fyll -0.245 0.173 5724.290 0.677 2.767
fydd 0.959 17.330 9412.027 0.000 3.098
cultints 0.004 0.001 4112.076 0.971 5.050
roots 0.492 1.079 56298.457 0.299 4.968
cereals 0.344 0.551 29504.787 0.458 7.342
gath 0.048 0.227 9040.855 0.633 3.082
plow -0.828 4.571 4807.793 0.033 2.961
hunt 0.208 3.537 5906.646 0.060 4.918
fish 0.047 0.350 20860.794 0.554 2.870
anim 0.188 4.000 5722.401 0.046 5.399
pigs -0.652 3.292 10317.708 0.070 2.395
milk -0.646 2.787 4578.696 0.095 4.295
bovines 0.361 0.970 17926.851 0.325 4.268
tree 0.354 0.354 32700.272 0.552 3.188
foodtrade -0.016 2.467 57530.121 0.116 1.461
foodscarc -0.007 0.005 59.198 0.942 1.259
ecorich -0.038 0.168 8364.699 0.682 1.812
popdens 0.123 1.391 4208.317 0.238 3.519
pathstress 0.041 1.347 3524.415 0.246 2.326
exogamy 0.063 0.532 2666.218 0.466 1.407
ncmallow -0.001 0.000 711.209 0.986 1.464
famsize -0.068 0.616 3687.550 0.433 1.545
settype 0.035 0.237 13194.384 0.626 3.995
localjh 0.139 0.591 14075.452 0.442 1.614
superjh -0.106 0.911 7536.585 0.340 2.636
moralgods -0.010 0.009 4854.136 0.923 1.979
fempower -0.104 2.821 111.975 0.096 1.328
femsubs 0.107 2.137 6270.723 0.144 1.677
sexratio -0.008 0.002 53.246 0.969 1.377
war 0.030 4.096 1673.933 0.043 1.366
himilexp 0.425 3.368 215.052 0.068 1.496
money 0.070 0.657 2569.946 0.418 2.122
wagelabor 0.034 0.058 114.294 0.810 1.509
> r2
R2:final model R2:IV(distance) R2:IV(language)
0.4320950 0.9782727 0.9735367
> ccc
Fstat df pvalue
RESET 3.536 1296.272 0.060
Wald on restrs. 8.630 350.689 0.004
NCV 0.078 1270.227 0.780
SWnormal 1.146 311.476 0.285
lagll 1.741 197145.781 0.187
lagdd 5.489 143627.342 0.019
SORT
coef Fstat ddf pvalue VIF
(Intercept) -1.249 0.333 649.049 0.564 NA
fyll -0.245 0.173 5724.290 0.677 2.767
cultints 0.004 0.001 4112.076 0.971 5.050
roots 0.492 1.079 56298.457 0.299 4.968
cereals 0.344 0.551 29504.787 0.458 7.342
gath 0.048 0.227 9040.855 0.633 3.082
fish 0.047 0.350 20860.794 0.554 2.870
bovines 0.361 0.970 17926.851 0.325 4.268
tree 0.354 0.354 32700.272 0.552 3.188
foodtrade -0.016 2.467 57530.121 0.116 1.461
foodscarc -0.007 0.005 59.198 0.942 1.259
ecorich -0.038 0.168 8364.699 0.682 1.812
popdens 0.123 1.391 4208.317 0.238 3.519
pathstress 0.041 1.347 3524.415 0.246 2.326
exogamy 0.063 0.532 2666.218 0.466 1.407
ncmallow -0.001 0.000 711.209 0.986 1.464
famsize -0.068 0.616 3687.550 0.433 1.545
settype 0.035 0.237 13194.384 0.626 3.995
localjh 0.139 0.591 14075.452 0.442 1.614
superjh -0.106 0.911 7536.585 0.340 2.636
moralgods -0.010 0.009 4854.136 0.923 1.979
femsubs 0.107 2.137 6270.723 0.144 1.677
sexratio -0.008 0.002 53.246 0.969 1.377
money 0.070 0.657 2569.946 0.418 2.122
wagelabor 0.034 0.058 114.294 0.810 1.509
fydd 0.959 17.330 9412.027 0.000 3.098 plow -0.828 4.571 4807.793 0.033 2.961 hunt 0.208 3.537 5906.646 0.060 4.918 milk -0.646 2.787 4578.696 0.095 4.295 anim 0.188 4.000 5722.401 0.046 5.399 pigs -0.652 3.292 10317.708 0.070 2.395 fempower -0.104 2.821 111.975 0.096 1.328 war 0.030 4.096 1673.933 0.043 1.366 himilexp 0.425 3.368 215.052 0.068 1.496
[edit] Program 2 (9-17-2009) obsolete
#September 17, 2009, Anthon Eff #MI--estimate model, combine results rm(list=ls(all=TRUE)) #--Set path to your directory with data and program-- #setwd("d:/projects/MI") setwd("c:/My Documents/MI") options(echo=TRUE) #--need these packages for estimation and diagnostics-- library(foreign) library(spdep) library(car) library(lmtest) library(sandwich) #----------------------------- #--Read in data, rearrange---- #----------------------------- #--Read in original SCCS data--- load(url("http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata"),.GlobalEnv) #--Read in two weight matrices-- ###ll<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/langwm.dta")[,-1]) ###changed to ll<-as.matrix(read.dta("langwm.dta")[,-1]) ###dd<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta")[,c(-1,-2,-189)]) ###changed to dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)]) #--Read in the imputed dataset--- load(url("http://frank.mtsu.edu/~eaeff/downloads/impdat.Rdata"),.GlobalEnv) #--create dep.varb. you wish to use from SCCS data-- ###--Here we sum variables measuring how much a society values children-- ###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max" ###DRW modified depvar depvar<-SCCS$v860 ###DRW #--find obs. for which dep. varb. is non-missing-- zdv<-which(!is.na(depvar)) depvar<-depvar[zdv] #--look at histogram and frequencies for the dep. varb.-- hist(depvar) table(depvar) #--modify weight matrices--- #--set diagonal equal to zeros-- diag(ll)<-0 diag(dd)<-0 #--use only obs. where dep. varb. non-missing-- ll<-ll[zdv,zdv] dd<-dd[zdv,zdv] #--row standardize (rows sum to one) ll<-ll/rowSums(ll) dd<-dd/rowSums(dd) #--check to see that rows sum to one rowSums(ll) rowSums(dd) #--make weight matrix object for later autocorrelation test-- wmatll<-mat2listw(as.matrix(ll)) wmatdd<-mat2listw(as.matrix(dd)) indpv<-c("femsubs","foodscarc", "exogamy","ncmallow","superjh","moralgods","fempower","sexratio", "war","himilexp","wagelabor", "famsize","settype","localjh","money", "cultints","roots","cereals","gath","hunt","fish", "anim","pigs","milk","plow","bovines","tree","foodtrade", "ndrymonth","ecorich", "popdens","pathstress","CVrain","rain","temp","AP1","AP2") #----------------------------------------------------- #---Estimate model on each imputed dataset------------ #----------------------------------------------------- #--number of imputed datasets-- nimp<-10 #--will append values to these empty objects-- vif<-NULL ss<-NULL beta<-NULL dng<-NULL #--loop through the imputed datasets-- for (i in 1:nimp){ #--select the ith imputed dataset-- m9<-impdat[which(impdat$.imp==i),] #--retain only obs. for which dep. varb. is nonmissing-- m9<-m9[zdv,] #--create spatially lagged dep. varbs.-- y<-as.matrix(depvar) xx<-as.matrix(m9[,indpv]) #--for instruments we use the spatial lag of our indep. varbs.-- #--First, the spatially lagged varb. for distance-- xdy<-dd%*%xx cy<-dd%*%y o<-lm(cy~xdy) #--the fitted value is our instrumental variable-- fydd<-fitted(o) #--keep R2 from this regression-- dr2<-summary(o)$r.squared #--Then, the spatially lagged varb. for language-- xdy<-ll%*%xx cy<-ll%*%y o<-lm(cy~xdy) #--the fitted value is our instrumental variable-- fyll<-fitted(o) #--keep R2 from this regression-- lr2<-summary(o)$r.squared m9<-cbind(m9,fydd,fyll) #--OLS estimate of unrestricted model-- xUR<-lm(depvar~fyll+fydd+dateobs+ ###DRW added dateobs cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor ,data=m9) #--OLS estimate of restricted model-- ###xR<-lm(depvar~fyll+fydd+settype+cultints+ ###roots+foodtrade+exogamy+ ###femsubs+fish, data=m9) ###DRW altered ###DRW added dateobs xR<-lm(depvar~fyll+fydd+dateobs+ cultints+roots+cereals+gath+plow+ hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ecorich+popdens+pathstress+exogamy+ncmallow+famsize+ settype+localjh+superjh+moralgods+fempower+femsubs+ sexratio+war+himilexp+money+wagelabor ,data=m9) #--collect coefficients and their variances-- ov<-summary(xR) vif<-rbind(vif,vif(xR)) ss<-rbind(ss,diag(ov$cov*ov$sigma^2)) #--collect robust coef. variances when there is heteroskedasticity-- #ss<-rbind(ss,diag(hccm(xR))) beta<-rbind(beta,coef(xR)) #--collect some model diagnostics-- dropt<-c("cereals","gath","plow","hunt","anim", "pigs","milk","bovines","foodscarc","ecorich", "popdens","pathstress","ncmallow","famsize","localjh", "superjh","moralgods","fempower","sexratio","money", "wagelabor","war","himilexp","tree") #--Ramsey RESET test-- p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE) #--F test that dropped variables had coefficient equal zero-- o<-linear.hypothesis(xUR,dropt,white.adjust=TRUE)$"Pr(>F)"[2] p2<-qchisq(o,1,lower.tail=FALSE) #--Heteroskedasticity test (H0: homoskedastic residuals)-- p3<-ncv.test(xR)$ChiSquare #--Shapiro-Wilke normality test (H0: residuals normal) p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE) #--LaGrange Multiplier test for spatial autocorrelation: language-- o<-lm.LMtests(xR, wmatll, test=c("LMlag")) p5<-as.numeric(o$LMlag$statistic) #--LaGrange Multiplier test for spatial autocorrelation: distance-- o<-lm.LMtests(xR, wmatdd, test=c("LMlag")) p6<-as.numeric(o$LMlag$statistic) #--model R2-- p7<-ov$r.squared dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2)) } #-------------------------------------------- #--Rubin's formulas for combining estimates-- #-------------------------------------------- #--first find final regr. coefs. and p-values-- mnb<-apply(beta,2,mean) vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1) mnv<-apply(ss,2,mean) vrT<-mnv+vrb*(1-nimp^(-1)) fst<-mnb^2/vrT r<-(1+nimp^(-1))*vrb/mnv v<-(nimp-1)*(1+r^(-1))^2 pval<-pf(fst,1,v,lower.tail=FALSE) bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3) names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF") #--Then combine the diagnostics we collected-- dng<-data.frame(dng) names(dng)<-c("RESET","F on restrs.","NCV","SWnormal","lagll","lagdd", "R2:final model","R2:IV(distance)","R2:IV(language)") r2<-apply(dng[,7:9],2,mean) adng<-dng[,1:6] mdm<-apply(adng,2,mean) vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1) aa<-4*mdm^2-2*vrd aa[which(aa<0)]<-0 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5) vd<-(nimp-1)*(1+rd^(-1))^2 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd) #-All chi-sq we collected have df=1------- pvald<-pf(Dm,1,vd,lower.tail=FALSE) ccc<-data.frame(round(cbind(Dm,vd,pvald),3)) names(ccc)<-c("Fstat","df","pvalue") bbb r2 ccc #--write results to csv file for perusal in spreadsheet-- write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE) write.csv(bbb,file="OLSresults.csv",append=TRUE) write.csv(r2,file="OLSresults.csv",append=TRUE) write.csv(ccc,file="OLSresults.csv",append=TRUE)
[edit] Program 2 (9-19-2009) obsolete
#September 19, 2009, Anthon Eff
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
###ll<-as.matrix(read.dta("langwm.dta")[,-1])
###changed to
ll<-as.matrix(read.dta("langwm.dta")[,-1])
###dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
###changed to
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
###--Here we sum variables measuring how much a society values children--
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
###DRW modified depvar
depvar<-SCCS$v860 ###DRW
dateobs<-SCCS$v838 ###DRW
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("femsubs","foodscarc",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
nimp<-10
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y ###EAE changed
o<-lm(cyd~xdy) ###EAE changed
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xdy<-ll%*%xx ###EAE changed
cyl<-ll%*%y ###EAE changed
o<-lm(cyl~xdy)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+dateobs+ ###DRW added dateobs
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
###xR<-lm(depvar~fyll+fydd+settype+cultints+
###roots+foodtrade+exogamy+
###femsubs+fish, data=m9)
###DRW altered ###DRW added dateobs
xR<-lm(depvar~fyll+fydd+dateobs+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*ov$sigma^2))
#--collect robust coef. variances when there is heteroskedasticity--
#ss<-rbind(ss,diag(hccm(xR)))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--F test that dropped variables had coefficient equal zero--
o<-linear.hypothesis(xUR,dropt,white.adjust=TRUE)$"Pr(>F)"[2]
p2<-qchisq(o,1,lower.tail=FALSE)
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-ov$r.squared
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","F on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
bbb
r2
ccc
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
[edit] Program 2 (9-22-2009) current
#MI--estimate model with network-lagged dependent variables, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#--can replace "sum" with "max"
childvalue<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum)
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(childvalue))
childvalue<-childvalue[zdv]
#--look at frequencies and quartiles for the dep. varb.--
summary(childvalue)
table(childvalue)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("femsubs","foodscarc","exogamy","ncmallow","superjh","moralgods",
"fempower","sexratio","war","himilexp","wagelabor","famsize","settype",
"localjh","money","cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich","popdens","pathstress","CVrain","rain",
"temp","AP1","AP2","migr","brideprice","nuclearfam","pctFemPolyg")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
nimp<-10
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(childvalue)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(childvalue~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor+
migr+brideprice+nuclearfam+pctFemPolyg
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(childvalue ~ fyll + cultints + roots + fish +
exogamy + settype + femsubs, data = m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"fydd","wagelabor","war","himilexp","tree","foodtrade")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE)
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
bbb
r2
ccc
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for childvalue==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
[edit] Program 2 (9-22-2009) rev 2
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
###setwd("d:/projects/MI")
setwd("c:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
###load(url("http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata"),.GlobalEnv)
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
###ll<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/langwm.dta")[,-1])
ll<-as.matrix(read.dta("langwm.dta")[,-1])
###dd<-as.matrix(read.dta("http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta")[,c(-1,-2,-189)])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
###load(url("http://frank.mtsu.edu/~eaeff/downloads/impdat.Rdata"),.GlobalEnv)
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
###depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v860 ###new polygyny variable
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
dateobs<-SCCS$v838 ###DRW NEW
dateobs<-dateobs[zdv] ###DRW NEW
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("femsubs","foodscarc",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
nimp<-10
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+ ###had been extra + (++) here and below
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
### xR<-lm(depvar~fyll+fydd+settype+cultints+
### roots+foodtrade+exogamy+
### femsubs+fish, data=m9)
### DRW ADDED ALL THE VARIABLES BELOW TO REPLACE THOSE ABOVE: WHY SO RESTRICTED?
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3))
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
bbb
r2
ccc
#Corrected to publication version with depvarname
#--write results to csv file for perusal in spreadsheet--
write.csv(paste("2SLS model for ",depvarname,sep=""),file="OLSresults.csv", append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
[edit] Follow up
What your R results are still on the screen Paste everything from xR to the end into the Polygyny follow-up and edit out the nonsignificant variables, then rerun while R is still open. (Dont use this file but make a copy)
Deborah Blumenthal's problem Doug 07:33, 7 October 2009 (PDT) Will help you work on this
