User:Jasiellt
ANTH174_10 your depvar<-SCCS$v57 is
Intro
Contribution by Women to Agriculture
821. Percent Female Contribution to Agriculture
. = Missing Data
0 = 0%
1 = 10%
2 = 20%
3 = 30%
4 = 40%
5 = 50%
6 = 60%
7 = 70%
8 = 80%
9 = 90%
10 =100%
As in:
- Ember R. Carol.American Anthropologist 1983 http://www.jstor.org/stable/676314
- Dwight B. Heath. 1958 http://www.jstor.org/stable/2573784?cookieSet=1
- Herbert Barry,III Brian L. Yoder 2002 http://ccr.sagepub.com/cgi/reprint/36/3/286
- Herbert Barry, III Brian L. Yoder 2002 http://ccr.sagepub.com/cgi/content/abstract/36/3/286 Multiple Predictors of Contribution by Women to Agriculture. Cross-Cultural Research, Vol. 36, No. 3, 286-297 (2002). Doug 13:54, 6 February 2010 (PST)Should be an interesting comparison as these might not be significant in 2SLS. QUOTE P. 289-290:
- The following six predictors were selected and eachdivided into two categories. For each predictor, the adjusted correlation with high contribution by women to agriculture is positive for the first category and negative for the second category.
- 1. Residence is not patrilocal after marriage (Murdock & Wilson, 1972). The 35 societies in the first category, not patrilocal, combine 22 with matrilocal, 5 with avunculocal, 4 with neolocal, and 4 with ambilocal residence. The second category is patrilocal residence for 73 societies.
- 2. General polygyny is the prevalent form of marriage in 38 societies (Murdock & Wilson, 1972). The second category, for 70 societies, combines 49 with limited polygyny and 21 with monogamy. None of the societies has polyandrous marriage.
- 3. No written language is used (Murdock & Provost, 1973b). The 63 societies with no written language combine 39 where written records are absent and 24 with simple tallies. The second category, for 45 societies, combines 11 with nonwritten records such as picture writing, 7 with sparse accumulation of indigenous written records or long use of alien writing, and 27 with indigenous written records.
- 4. Use of money is indigenous (Murdock & Provost, 1973b). The 43 societies with indigenous use of money combine 23 with metal coins or paper currency and 20 with articles of token value, such as
cowrie shells or wampum. The second category, for 65 societies, combines 25 with long use of currency of an alien people, 8 with usable articles such as salt, and 32 with exchange exclusively by barter.
- 5. Population is sparse instead of dense (Murdock & Provost, 1973b). The 44 societies with sparse population combine 11 with fewer than 1 person per square mile, 10 with 5 people or fewer per square
mile, and 23 with 25 people or fewer per square mile. The second category, for 64 societies, combines 29 with 100 people or fewer per square mile and 35 with more than 100 people per square mile.
- 6. No milk is obtained from domestic animals in 88 societies (Murdock & Morrow, 1970). The second category contains 20 societies that obtain milk from domestic animals.
TABLE 1 Unadjusted and Adjusted Correlations for Each of Six Cultural Customs With Contribution by Women to Agriculture
- Cultural Custom Unadjusted
- Correlation Adjusted Correlation
- Not patrilocal .37*** .41***
- Polygynous marriage .37*** .38***
- No written language .49*** .35***
- Indigenous money .04 .34***
- Sparse population .31** .26**
- No milk obtained .40*** .22*
- Multiple correlation .73
- Adjusted proportion .50***
- NOTE: Correlations shown are for the world sample of 108 agricultural societies.
The adjusted correlation is the partial correlation of each custom with contribution by women to agriculture, adjusting for the effects of the other five customs. Summary values are the multiple correlation and the adjusted proportion of the variance accounted for by the multiple correlation. Probability values are for the difference from zero correlation.
- p < .05. **p < .01. ***p < .001.
CODE
3. AGRICULTURE- CONTRIBUTION TO LOCAL FOOD SUPPLY
35 1 = None
3 2 = Non-food Crops
17 3 = < 10%
12 4 = < 50%, and less than any other single source, incl. trade
42 5 = < 50%, and more than any other single source, incl. trade
77 6 = Primarily agricultural
Day 1 (january 5th)
ctrl-a will allow you to select all contents found within the page.
Double-square brackets [[]] can be used for referencing something within this Wikipedia website
For indexing purposes use two equal signs: ==heading== make sure you have your heading in the middle of the two-brackets
Day 2 (january 7th)
cntrl-f allows search box to be in use. On website: first map is pre-conquest. second map is post-conquest when looking at war domestic animals can be a variable
xVR never change this drop-> except when there is an indep variable ---depvar add-> indep ver alter your xR is done
Correlation does NOT mean causation ex. mud -> rain Particularly with humans, what happens in one society easily affects others 95% American Indian population died because of new germs introduced from other areas of the world
for homework:find the dependent variables. Look at resources, map it out (GIS mapping)and go to codebook to find variable.
Day 3 (january 12)
textpad - adding a space and then copying results from wiki -allows me to see everything in a clear way and not all cramped together.
this talk explains how these benefits can contribute to a proposed open eRepository for anthropological knowledge! talk will have 5 parts.
- 1. Benefits: to analyze
- A. solving Galton's problem with 2 stage least squares ( this is the "network problem" societies are related by historical historical network processes by which they influence one another.
- B. Casualty C. MI D. GIS
- 2. benefits: to data
- 3. Benefits: findings and models
- 4. Benefits: findings of networks
- 5. benefits: galtons problem of nonindependence has been solved using instrumental variables for language and spatial clustering.
Instrumantal variables
- Regression models are not well estimated when their error terms are not independent but contain the residusls of unspecified variables or systematic networl interactions.
Instrumental variables(IV) are observed variables in structural models(SEMs), other than the cause or treatment of interest, that can play an instrumantal role in identifying and estimating casual effects.
they require 2 stages OF OLS analysis 2SLS
notes for Feb.2
Outline notes exploring your topic readings, G scholer Maps - 60 "extra codes" to do/ or not -topic choice - readings on the topic( ex 1 and ex 2) - depvar and name must match - questions in class - avoid # sign in
Main topic: from corelations to causality Anthro's error: 1889 tylor- galton all disciplines suffer problem of determionistic theory in discipine + policy - challenger disaster ( ITS BEEN VERY COLD AND IF ITR WAS GNNA HARM THE EQUIPMENT, so they did a deterministic analysis, loses our governments tax money, just money thrown away) they were look ing for a regularity thats what they did wrong. only 3 people new that enrod was a disaster and was going to collapse! the whole system collapsing probabilistic thinking
2SLS
*the IVariables in 2SLS tend to reduce significance when these are operative.
*W for matrices.
*multiply each row by y gives ur dependent variable! it is influenced byt the "background similarity" of a common linguistics.
*Open textpad, and space, do recopy sequence But this one you edumod page
bbb
coef Fstat ddf pvalue VIF
(Intercept) 0.62582705 0.14144757 1.209601e+01 0.71335960 NA
fydd 0.67771647 11.84978111 2.263947e+01 0.00225505 2.317
pre_mar_sex 0.01874176 0.04552059 1.062731e+01 0.83508732 1.378
money 0.05520660 0.51828362 1.663585e+02 0.47258499 2.231
foodstress 0.06493711 0.23932771 2.442313e+01 0.62905693 1.454
femctrldwellg -0.14597995 1.44060574 4.661852e+00 0.28745733 1.607
wealthy 0.00418724 0.00100960 1.347243e+01 0.97511810 1.651
fratgrpstr 0.09011682 0.49066176 5.308538e+00 0.51314053 3.029
marrcaptives 0.28228275 4.80125914 9.560179e+00 0.05443314 1.782
plunder -0.63349068 7.67922912 1.289722e+01 0.01598154 1.559
cultints -0.00135031 0.00017242 1.521656e+03 0.98952517 5.159
roots 0.45780807 1.08082787 2.774757e+03 0.29860301 5.131
cereals 0.29738312 0.49166316 5.388559e+03 0.48321662 7.433
gath 0.09304185 0.83328532 2.484151e+01 0.37009901 3.226
plow -0.53825557 1.99462116 8.583336e+01 0.16147400 3.274
hunt 0.12978343 1.59476698 2.976053e+03 0.20674532 5.214
fish -0.01792164 0.05433460 1.863767e+02 0.81594124 3.091
anim 0.14029967 2.56869429 1.026653e+05 0.10900074 5.781
pigs -0.53273657 2.58995167 1.290100e+02 0.10998795 2.333
milk -0.78923686 4.42849239 7.040738e+01 0.03891527 4.554
bovines 0.30426680 0.77904007 4.042261e+02 0.37795946 4.454
tree 0.51246025 0.82815823 3.078110e+02 0.36351768 3.332
foodtrade -0.01644826 2.94644139 1.591187e+02 0.08801288 1.525
foodscarc -0.05109507 0.39073075 1.272545e+01 0.54296238 1.438
ecorich -0.05468527 0.40071359 4.294367e+03 0.52675467 1.954
popdens 0.14071013 2.00011355 1.443825e+02 0.15944020 3.770
pathstress 0.03580103 1.10392267 6.572938e+01 0.29725451 2.503
exogamy 0.03398801 0.17983079 1.629889e+02 0.67207786 1.432
ncmallow -0.01019023 0.06026975 1.740922e+01 0.80894084 1.511
famsize -0.05764492 0.47947465 6.563276e+01 0.49110312 1.599
settype -0.00717448 0.01081782 6.265193e+01 0.91749463 4.155
localjh 0.00205560 0.00014427 4.109672e+02 0.99042257 1.706
superjh -0.10215123 0.95842870 2.390200e+02 0.32857292 2.713
moralgods 0.01874915 0.02836560 7.831533e+00 0.87052037 1.929
fempower -0.08698682 2.42768469 1.833743e+01 0.13630000 1.428
femsubs 0.12607532 3.45693030 2.778303e+04 0.06299771 1.758
sexratio -0.14286337 0.53003480 4.857335e+00 0.50015572 1.421
war 0.00615811 0.12327135 8.194928e+00 0.73436904 1.621
himilexp 0.25267635 1.56287499 5.900076e+02 0.21174037 1.612
wagelabor 0.03886651 0.11489348 7.404447e+01 0.73559972 1.499
> r2
R2:final model R2:IV(distance) R2:IV(language)
0.5457357 0.9860286 0.9812755
> ccc
Fstat df pvalue
RESET 5.164 104.814 0.025
Wald on restrs. 2.295 24.305 0.143
NCV 0.131 3190.844 0.717
SWnormal 0.146 138.033 0.703
lagll 2.859 5640.802 0.091
lagdd 5.023 14575.154 0.025
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 0.626 0.141 12.096 0.713 NA
fydd 0.678 11.850 22.639 0.002 2.317
pre_mar_sex 0.019 0.046 10.627 0.835 1.378
money 0.055 0.518 166.359 0.473 2.231
foodstress 0.065 0.239 24.423 0.629 1.454
femctrldwellg -0.146 1.441 4.662 0.287 1.607
wealthy 0.004 0.001 13.472 0.975 1.651
fratgrpstr 0.090 0.491 5.309 0.513 3.029
marrcaptives 0.282 4.801 9.560 0.054 1.782
plunder -0.633 7.679 12.897 0.016 1.559
cultints -0.001 0.000 1521.656 0.990 5.159
roots 0.458 1.081 2774.757 0.299 5.131
cereals 0.297 0.492 5388.559 0.483 7.433
gath 0.093 0.833 24.842 0.370 3.226
plow -0.538 1.995 85.833 0.161 3.274
hunt 0.130 1.595 2976.053 0.207 5.214
fish -0.018 0.054 186.377 0.816 3.091
anim 0.140 2.569 102665.273 0.109 5.781
pigs -0.533 2.590 129.010 0.110 2.333
milk -0.789 4.428 70.407 0.039 4.554
bovines 0.304 0.779 404.226 0.378 4.454
tree 0.512 0.828 307.811 0.364 3.332
foodtrade -0.016 2.946 159.119 0.088 1.525
foodscarc -0.051 0.391 12.725 0.543 1.438
ecorich -0.055 0.401 4294.367 0.527 1.954
popdens 0.141 2.000 144.382 0.159 3.770
pathstress 0.036 1.104 65.729 0.297 2.503
exogamy 0.034 0.180 162.989 0.672 1.432
ncmallow -0.010 0.060 17.409 0.809 1.511
famsize -0.058 0.479 65.633 0.491 1.599
settype -0.007 0.011 62.652 0.917 4.155
localjh 0.002 0.000 410.967 0.990 1.706
superjh -0.102 0.958 239.020 0.329 2.713
moralgods 0.019 0.028 7.832 0.871 1.929
fempower -0.087 2.428 18.337 0.136 1.428
femsubs 0.126 3.457 27783.028 0.063 1.758
sexratio -0.143 0.530 4.857 0.500 1.421
war 0.006 0.123 8.195 0.734 1.621
himilexp 0.253 1.563 590.008 0.212 1.612
wagelabor 0.039 0.115 74.044 0.736 1.499
> aaa
1 2 3 4 5
"27" "32" "45" "34" "46" "184" "polygyny"
> impute
PROGRAM for agricultural unrestricted model
#Program 1 --> Program 2 below
#Program 1 Part A
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam))
#--for program 2 below
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#--To find the citation for a package, use this function:---
citation("mice")
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)
#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)
#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'
#'1 (Christianity)'
#'2 (Islam)'
#'3 (Hindu/Buddhist)'
#--check to see number of missing values in vaux,
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=log(1+SCCS$v821),
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
fratgrpstr=SCCS$v570,
marrcaptives=SCCS$v870,
plunder=SCCS$v912,
###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
nonmatrel=SCCS$v52,
lrgfam=SCCS$v68,
exogamy=SCCS$v72,
famsize=SCCS$v80,
money=SCCS$v155,
popdens=SCCS$v156,
malesexag=SCCS$v175,
ndrymonth=SCCS$v196,
gath=SCCS$v203,
hunt=SCCS$v204,
fish=SCCS$v205,
anim=SCCS$v206,
brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,
ncmallow=SCCS$v227,
cultints=SCCS$v232,
tree=(SCCS$v233==4)*1,
roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,
settype=SCCS$v234,
localjh=(SCCS$v236-1),
superjh=SCCS$v237,
moralgods=SCCS$v238,
segadlboys=SCCS$v242,
plow=(SCCS$v243>1)*1,
pigs=(SCCS$v244==2)*1,
bovines=(SCCS$v244==7)*1,
milk=(SCCS$v245>1)*1,
agrlateboy=SCCS$v300,
fempower=SCCS$v663,
migr=(SCCS$v677==2)*1,
foodtrade=SCCS$v819,
dateobs=SCCS$v838,
rain=SCCS$v854,
temp=SCCS$v855,
ecorich=SCCS$v857,
#pctFemPolyg=SCCS$v872,
femsubs=SCCS$v890,
himilexp=(SCCS$v899==1)*1,
AP1=SCCS$v921,
AP2=SCCS$v928,
pathstress=SCCS$v1260,
war=SCCS$v1648,
foodscarc=SCCS$v1685,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
wagelabor=SCCS$v1732,
CVrain=SCCS$v1914/SCCS$v1913
)
#--look at first 6 rows of fx--
head(fx)
#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp
#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2
#Program 1 Part B
#-----------------------------
#----Multiple imputation------
#-----------------------------
#--number of imputed data sets to create--
#nimp<-10 #CHANGED TO TEST SPEEDUP
nimp<-3 ################################################################## speedup test
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}
#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)
#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)
#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-log(1+SCCS$v821)
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("femsubs","foodscarc",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr+marrcaptives+plunder+ ###climate+fish+
# plow+ ###cereals+gath+
# ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
# ###pathstress+ ecorich+popdens+exogamy+ncmallow+famsize+
# ###settype+localjh+superjh+moralgods+
# fempower+femsubs ###+
# ###war+himilexp +money+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("cereals","gath","plow","hunt","anim",
"pigs","milk","bovines","foodscarc","ecorich",
"popdens","pathstress","ncmallow","famsize","localjh",
"superjh","moralgods","fempower","sexratio","money",
"wagelabor","war","himilexp","tree")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
hist(log(cyl/y))
results
r2
R2:final model R2:IV(distance) R2:IV(language)
0.6457590 0.9817316 0.9833483
> ccc
Fstat df pvalue
RESET 19.176 128.159 0.000
Wald on restrs. 2.877 57.587 0.095
NCV 23.280 22.166 0.000
SWnormal 16.474 25.890 0.000
lagll 2.367 44059.470 0.124
lagdd 3.225 7025.631 0.073
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) -1.476 0.437 531.357 0.509 NA
fyll 1.071 4.031 1086.194 0.045 2.938
fydd -0.282 0.627 706.426 0.429 4.443
AP1 -0.020 0.321 927.642 0.571 2.168
AP2 0.041 0.437 9915.745 0.509 1.708
fratgrpstr -0.024 0.079 24.389 0.782 2.773-->delte
marrcaptives -0.070 0.491 119.974 0.485 2.264
plunder -0.196 1.230 317.362 0.268 1.894
cultints 0.163 4.544 1336.274 0.033 2.757-->sign
roots -0.594 1.182 406.992 0.278 14.492
cereals -0.309 0.345 3162.740 0.557 16.870
gath 0.080 0.711 289.046 0.400 2.399
plow -0.147 0.246 3494.554 0.620 4.055
hunt 0.155 2.878 2966.755 0.090 3.906-->sign
fish 0.083 1.491 278216.956 0.222 2.410
anim -0.135 2.920 220.205 0.089 4.408--.sign
pigs 0.079 0.096 333.018 0.757 2.640
milk -0.375 1.917 711.290 0.167 4.215
bovines 0.496 3.254 361.576 0.072 4.717-->sign
tree -0.271 0.193 2637.192 0.661 7.594
foodtrade -0.007 0.651 164.858 0.421 1.774
foodscarc -0.010 0.032 39.113 0.859 1.348-->delete
ecorich 0.052 0.540 408526.751 0.462 1.808
popdens -0.103 1.861 2563.314 0.173 3.283
pathstress 0.007 0.086 1552.587 0.769 1.862-->delete
exogamy 0.083 1.587 169.793 0.210 1.569
ncmallow -0.027 0.666 30.433 0.421 1.622
famsize 0.059 0.838 2424.650 0.360 1.611
settype 0.075 1.792 90457254.210 0.181 2.980
localjh -0.047 0.121 49481.470 0.728 1.776
superjh 0.008 0.010 1960.800 0.921 2.890--.delete
moralgods 0.050 0.366 24012.380 0.545 2.462
fempower -0.024 0.275 14.146 0.608 1.433
femsubs 0.457 62.526 631.620 0.000 1.900
sexratio 0.027 0.038 18.496 0.848 1.436-->delete
war -0.012 1.059 6685.954 0.304 1.760
himilexp 0.113 0.444 110.975 0.507 1.595
money -0.029 0.204 2183.696 0.652 2.415
wagelabor 0.090 0.951 6512.520 0.330 1.743
> aaa
0 1.79175946922805 1.94591014905531 2.39789527279837
"8" "7" "1" "4"
2.77258872223978 2.99573227355399 3.04452243772342 3.25809653802148
"9" "2" "9" "9"
3.43398720448515 3.46573590279973 3.52636052461616 3.58351893845611
"5" "2" "1" "10"
3.66356164612965 3.71357206670431 3.80666248977032 3.82864139648910
"1" "12" "2" "9"
3.93182563272433 4.02535169073515 4.04305126783455 4.11087386417331
"9" "6" "2" "4"
4.15888308335967 4.18965474202643 4.21950770517611 4.26267987704132
"1" "11" "1" "5"
4.33073334028633 4.39444915467244 4.45434729625351 4.56434819146784
"3" "3" "4" "1"
4.61512051684126
"1" "142" "PctFemContAg"
> impute
[1] "number of imputations nimp=" "3"
> hist(log(cyl/y))
>
program
- Program 1 --> Program 2 below
#Program 1 Part A
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam))
#--for program 2 below
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#--To find the citation for a package, use this function:---
citation("mice")
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)
#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)
#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'
#'1 (Christianity)'
#'2 (Islam)'
#'3 (Hindu/Buddhist)'
#--check to see number of missing values in vaux,
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=SCCS$v821,
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
fratgrpstr=SCCS$v570,
marrcaptives=SCCS$v870,
plunder=SCCS$v912,
###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
nonmatrel=SCCS$v52,
lrgfam=SCCS$v68,
exogamy=SCCS$v72,
famsize=SCCS$v80,
money=SCCS$v155,
popdens=SCCS$v156,
malesexag=SCCS$v175,
ndrymonth=SCCS$v196,
gath=SCCS$v203,
hunt=SCCS$v204,
fish=SCCS$v205,
anim=SCCS$v206,
brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,
ncmallow=SCCS$v227,
cultints=SCCS$v232,
tree=(SCCS$v233==4)*1,
roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,
settype=SCCS$v234,
localjh=(SCCS$v236-1),
superjh=SCCS$v237,
moralgods=SCCS$v238,
segadlboys=SCCS$v242,
plow=(SCCS$v243>1)*1,
pigs=(SCCS$v244==2)*1,
bovines=(SCCS$v244==7)*1,
milk=(SCCS$v245>1)*1,
agrlateboy=SCCS$v300,
fempower=SCCS$v663,
migr=(SCCS$v677==2)*1,
foodtrade=SCCS$v819,
dateobs=SCCS$v838,
rain=SCCS$v854,
temp=SCCS$v855,
ecorich=SCCS$v857,
#pctFemPolyg=SCCS$v872,
###femsubs=SCCS$v890, ###Dep var is added to get this ... so deleted
himilexp=(SCCS$v899==1)*1,
AP1=SCCS$v921,
AP2=SCCS$v928,
pathstress=SCCS$v1260,
war=SCCS$v1648,
foodscarc=SCCS$v1685,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
wagelabor=SCCS$v1732,
CVrain=SCCS$v1914/SCCS$v1913
)
#--look at first 6 rows of fx--
head(fx)
#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp
#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2
#Program 1 Part B
#-----------------------------
#----Multiple imputation------
#-----------------------------
#--number of imputed data sets to create--
#nimp<-10 #CHANGED TO TEST SPEEDUP
nimp<-3 ################################################################## speedup test
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}
#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)
#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)
#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+AP1+AP2+
marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+
ecorich+popdens+exogamy+ncmallow+famsize+
settype+localjh+moralgods+fempower+###femsubs+
war+himilexp+money+wagelabor
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr+marrcaptives+plunder+ ###climate+fish+
# plow+ ###cereals+gath+
# ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
# ###pathstress+ ecorich+popdens+exogamy+ncmallow+famsize+
# ###settype+localjh+superjh+moralgods+
# fempower+femsubs ###+
# ###war+himilexp +money+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
2nd results replaced: no femsubs, no log(femcontagri) all diagnostics ok now
R2:final model R2:IV(distance) R2:IV(language)
0.3953282 0.9721403 0.9850290
Fstat df pvalue
RESET 0.024 2193952.700 0.878
Wald on restrs. 0.263 18727.343 0.608
NCV 1.494 76.357 0.225
SWnormal 1.333 3482.239 0.248
lagll 0.969 3370425.593 0.325
lagdd 1.506 3438.520 0.220
mnb fst v pval VIF
(Intercept) -2.089 0.003 38.590 0.959 NA
fyll 1.150 3.902 143.938 0.050 2.948 <-- keep
fydd 0.166 0.181 7639.121 0.670 3.536
AP1 -1.032 1.270 545.643 0.260 1.871
AP2 0.461 0.070 693.127 0.792 1.638
marrcaptives 0.901 0.120 62.218 0.730 1.838
plunder -2.830 0.320 109.557 0.573 1.806
cultints -0.673 0.091 125.740 0.763 2.730
roots 12.604 0.708 742.489 0.400 13.529
cereals 5.853 0.153 1469.957 0.695 16.601
gath -1.909 0.501 1628.870 0.479 2.437
plow -1.426 0.036 272312.902 0.850 3.262
hunt 1.792 0.498 969.412 0.480 3.648
fish -0.493 0.068 35519.524 0.794 2.282
anim -2.384 1.221 1901.243 0.269 4.181 <-- keep (tentative) VIF TOO HIGH
pigs 1.388 0.041 2172.703 0.839 2.406
milk -5.653 0.663 653491.534 0.416 3.501
bovines 2.246 0.096 4124.342 0.757 4.129
tree 3.206 0.034 2295.950 0.855 7.500
foodtrade -0.357 2.290 399.508 0.131 1.683 <-- keep
ecorich -0.992 0.265 737.507 0.607 1.606
popdens -2.238 1.114 516.971 0.292 3.110
exogamy 3.306 3.762 7018.488 0.052 1.366 <-- keep
ncmallow 1.095 1.268 19.050 0.274 1.576 <-- keep (tentative)
famsize -0.327 0.033 1349.584 0.855 1.527
settype -0.385 0.059 428.300 0.808 2.841
localjh -1.571 0.182 2907.841 0.670 1.601
moralgods -0.082 0.001 129752.699 0.970 2.165
fempower 0.172 0.020 22.125 0.888 1.415
war 0.198 0.388 127.192 0.534 1.488
himilexp 5.684 1.576 13660.481 0.209 1.502 <-- keep (tentative)
money 2.124 1.592 2468.882 0.207 2.064 <-- keep (tentative)
wagelabor -0.006 0.000 3.451 0.999 1.579
program replaced from 9A| above, ready for trim of xR<-
- Program 1 --> Program 2 below
#Program 1 Part A
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MI")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripak)
library(zoo)
library(sp)
library(maptools)
library(spam))
#--for program 2 below
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#--To find the citation for a package, use this function:---
citation("mice")
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("SCCS.Rdata",.GlobalEnv)
#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((SCCS$socname==vaux$socname)*1)
#--remove the society name field--
vaux<-vaux[,-28]
names(vaux)
#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'
#'1 (Christianity)'
#'2 (Islam)'
#'3 (Hindu/Buddhist)'
#--check to see number of missing values in vaux,
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from SCCS, put in dataframe fx--
fx<-data.frame(
socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=SCCS$v821,
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
fratgrpstr=SCCS$v570,
marrcaptives=SCCS$v870,
plunder=SCCS$v912,
###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
nonmatrel=SCCS$v52,
lrgfam=SCCS$v68,
exogamy=SCCS$v72,
famsize=SCCS$v80,
money=SCCS$v155,
popdens=SCCS$v156,
malesexag=SCCS$v175,
ndrymonth=SCCS$v196,
gath=SCCS$v203,
hunt=SCCS$v204,
fish=SCCS$v205,
anim=SCCS$v206,
brideprice=(SCCS$v208==1)*1,
nuclearfam=(SCCS$v210<=3)*1,
ncmallow=SCCS$v227,
cultints=SCCS$v232,
tree=(SCCS$v233==4)*1,
roots=(SCCS$v233==5)*1,
cereals=(SCCS$v233==6)*1,
settype=SCCS$v234,
localjh=(SCCS$v236-1),
superjh=SCCS$v237,
moralgods=SCCS$v238,
segadlboys=SCCS$v242,
plow=(SCCS$v243>1)*1,
pigs=(SCCS$v244==2)*1,
bovines=(SCCS$v244==7)*1,
milk=(SCCS$v245>1)*1,
agrlateboy=SCCS$v300,
fempower=SCCS$v663,
migr=(SCCS$v677==2)*1,
foodtrade=SCCS$v819,
dateobs=SCCS$v838,
rain=SCCS$v854,
temp=SCCS$v855,
ecorich=SCCS$v857,
#pctFemPolyg=SCCS$v872,
###femsubs=SCCS$v890, ###Dep var is added to get this ... so deleted
himilexp=(SCCS$v899==1)*1,
AP1=SCCS$v921,
AP2=SCCS$v928,
pathstress=SCCS$v1260,
war=SCCS$v1648,
foodscarc=SCCS$v1685,
sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
wagelabor=SCCS$v1732,
CVrain=SCCS$v1914/SCCS$v1913
)
#--look at first 6 rows of fx--
head(fx)
#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp
#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2
#Program 1 Part B
#-----------------------------
#----Multiple imputation------
#-----------------------------
#--number of imputed data sets to create--
#nimp<-10 #CHANGED TO TEST SPEEDUP
nimp<-3 ################################################################## speedup test
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}
#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)
#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)
#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+###AP1+AP2+
###fratgrpstr+pigs+milk+bovines+tree+foodscarc+
###marrcaptives+plunder+roots###climate+
###cultints+cereals+gath+plow
###hunt+fish+
anim ###+###pigs+milk+bovines+tree+foodtrade+
###ecorich+popdens+exogamy+
###ncmallow ###+###famsize+
###settype+localjh+moralgods+fempower+###femsubs+
###war+wagelabor+
###himilexp ###+money######
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr+marrcaptives+plunder+ ###climate+fish+
# plow+ ###cereals+gath+
# ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
# ###pathstress+ ecorich+popdens+exogamy+ncmallow+famsize+
# ###settype+localjh+superjh+moralgods+
# fempower+femsubs ###+
# ###war+himilexp +money+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
add all the non sign. var into the model
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model-- xR<-lm(depvar~fyll+fydd+anim+ wagelabor+war+fempower+moralgods+famsize+settype+money+tree+ ecorich+bovines+milk+pigs+fish+cereals+himilexp+popdens+ncmallow+ localjh+marrcaptives+plunder+plow ,data=m9) # fydd+ ### fyll+ <--- ADDED fydd for distance clustering # fratgrpstr ###climate+ # ###+gath+ # ###huntanim+++foodtrade+foodscarc+ # ###pathstress+ +exogamy # ###+superjh+ # +femsubs ###+ # ###war+wagelabor ###sexratio+ # ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
run 1
r2
R2:final model R2:IV(distance) R2:IV(language)
0.3413441 0.9715283 0.9843970
> ccc
Fstat df pvalue
RESET 0.013 7600999.574 0.909
Wald on restrs. 0.149 1601.024 0.699
NCV 2.231 223.178 0.137
SWnormal 0.001 7550390.705 0.972
lagll 0.674 766837.254 0.412
lagdd 1.115 905362.164 0.291
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) -18.223 0.376 7.541460e+02 0.540 NA
fyll 1.494 8.611 7.632658e+03 0.003 2.348
fydd 0.158 0.185 1.766700e+03 0.668 3.098
anim -2.322 1.813 3.221159e+06 0.178 2.673
wagelabor 2.425 0.970 1.000292e+03 0.325 1.457
war 0.367 1.434 1.145640e+02 0.234 1.356
fempower -0.562 0.242 4.032000e+01 0.625 1.247-->high pval
moralgods 0.293 0.017 3.890900e+01 0.897 1.982--> high pval
famsize -0.177 0.010 2.277362e+04 0.919 1.425-->high pval
settype 0.264 0.042 2.846148e+08 0.838 1.945-->high pval
money 1.068 0.446 5.281598e+04 0.504 1.853
tree -8.484 1.140 7.128050e+02 0.286 1.505
ecorich -1.531 0.686 3.391136e+04 0.408 1.491
bovines 3.839 0.291 1.073881e+04 0.590 3.948
milk -4.831 0.475 1.505470e+02 0.492 3.303
pigs 4.867 0.555 5.303360e+02 0.457 2.127
fish -0.452 0.073 1.420144e+05 0.787 1.775-->high pval
cereals -5.410 1.176 3.649807e+04 0.278 1.852
himilexp 2.577 0.348 6.508700e+01 0.557 1.274
popdens -2.334 1.548 3.661776e+05 0.213 2.472
ncmallow 1.105 1.683 1.412040e+02 0.197 1.412
localjh -1.874 0.281 4.714552e+04 0.596 1.471
marrcaptives 1.775 0.512 1.727430e+02 0.475 1.730
plunder -3.188 0.379 1.311800e+01 0.549 1.559
plow -2.988 0.181 1.028542e+04 0.671 2.769-->high pval
> aaa
0 5 6 10 15
"8" "7" "1" "4" "9"
19 20 25 30 31
"2" "9" "9" "5" "2"
33 35 38 40 44
"1" "10" "1" "12" "2"
45 50 55 56 60
"9" "9" "6" "2" "4"
63 65 67 70 75
"1" "11" "1" "5" "3"
80 85 95 100
"3" "4" "1" "1" "142"
"PctFemContAg"
> impute
[1] "number of imputations nimp=" "3"
>
run 2
- Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model-- xR<-lm(depvar~fyll+fydd+anim+ wagelabor+war+money+tree+ ecorich+bovines+milk+pigs+cereals+himilexp+popdens+ncmallow+ localjh+marrcaptives+plunder ,data=m9) # fydd+ ### fyll+ <--- ADDED fydd for distance clustering # fratgrpstr ###climate+ # ###+gath+ # ###huntanim+++foodtrade+foodscarc+ # ###pathstress+ +exogamy # ###+superjh+ # +femsubs ###+ # ###war+wagelabor ###sexratio+ # ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
results round2
R2:final model R2:IV(distance) R2:IV(language)
0.3341113 0.9715283 0.9843970
> ccc
Fstat df pvalue
RESET 0.032 194311.812 0.858
Wald on restrs. 0.149 1601.024 0.699
NCV 2.536 644.915 0.112
SWnormal 0.001 5281494.912 0.980
lagll 0.622 572952.563 0.430
lagdd 1.069 34882013.728 0.301
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) -24.385 0.936 101.704 0.336 NA
fyll 1.562 11.023 71482.352 0.001 2.096
fydd 0.167 0.251 733.487 0.617 2.610
anim -2.274 2.103 597844.287 0.147 2.298
wagelabor 2.673 1.270 692.565 0.260 1.399
war 0.348 1.426 116.112 0.235 1.275
money 1.232 0.685 832690.379 0.408 1.677
tree -7.039 0.884 6306.554 0.347 1.417
ecorich -1.508 0.741 19516.569 0.390 1.392
bovines 3.013 0.225 19377.422 0.635 3.274--> high
milk -4.477 0.499 1075.717 0.480 2.924--.high
pigs 5.057 0.684 301.301 0.409 1.916
cereals -4.392 0.912 25451007.489 0.340 1.642
himilexp 2.609 0.410 155.154 0.523 1.198-->high
popdens -2.563 2.331 10918.983 0.127 2.046
ncmallow 1.094 1.804 89.224 0.183 1.318
localjh -1.838 0.366 21002.568 0.545 1.127-->high
marrcaptives 1.853 0.615 82.557 0.435 1.583-->high
plunder -3.500 0.497 12.920 0.493 1.485-->high
> aaa
0 5 6 10 15
"8" "7" "1" "4" "9"
19 20 25 30 31
"2" "9" "9" "5" "2"
33 35 38 40 44
"1" "10" "1" "12" "2"
45 50 55 56 60
"9" "9" "6" "2" "4"
63 65 67 70 75
"1" "11" "1" "5" "3"
80 85 95 100
"3" "4" "1" "1" "142"
"PctFemContAg"
> impute
[1] "number of imputations nimp=" "3"
>
run 3
- Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+anim+
wagelabor+war+money+tree+
ecorich+pigs+cereals+popdens+ncmallow
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr ###climate+
# ###+gath+
# ###huntanim+++foodtrade+foodscarc+
# ###pathstress+ +exogamy
# ###+superjh+
# +femsubs ###+
# ###war+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
run 3 results
r2
R2:final model R2:IV(distance) R2:IV(language)
0.3109951 0.9715283 0.9843970
> ccc
Fstat df pvalue
RESET 0.141 58133.382 0.708
Wald on restrs. 0.149 1601.024 0.699
NCV 3.810 13984.923 0.051
SWnormal 0.004 102948.381 0.948
lagll 0.491 471582.214 0.484
lagdd 0.889 2150990.650 0.346
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) -36.386 3.652 7462087.063 0.056 NA
fyll 1.642 13.038 22734.652 0.000 1.983
fydd 0.260 0.687 305226.983 0.407 2.395
anim -2.429 3.545 5249510.920 0.060 1.577-->sign.
wagelabor 2.581 1.280 1289.262 0.258 1.320
war 0.531 3.951 67.923 0.051 1.060-->sign.
money 1.263 0.767 7536694.393 0.381 1.596-->remove
tree -6.587 0.819 35644.327 0.365 1.362-->remove
ecorich -1.114 0.432 11886309.109 0.511 1.327-->remove
pigs 5.311 1.030 17940.744 0.310 1.478
cereals -4.380 0.953 18041.723 0.329 1.576-->remove
popdens -2.678 3.083 677080.151 0.079 1.723-->sign
ncmallow 0.802 1.112 195.466 0.293 1.199
> aaa
0 5 6 10 15
"8" "7" "1" "4" "9"
19 20 25 30 31
"2" "9" "9" "5" "2"
33 35 38 40 44
"1" "10" "1" "12" "2"
45 50 55 56 60
"9" "9" "6" "2" "4"
63 65 67 70 75
"1" "11" "1" "5" "3"
80 85 95 100
"3" "4" "1" "1" "142"
"PctFemContAg"
> impute
[1] "number of imputations nimp=" "3"
>
run 4
- Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+anim+
wagelabor+war+pigs+popdens+ncmallow
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr ###climate+
# ###+gath+
# ###huntanim+++foodtrade+foodscarc+
# ###pathstress+ +exogamy
# ###+superjh+
# +femsubs ###+
# ###war+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
4th results
r2
R2:final model R2:IV(distance) R2:IV(language)
0.2977494 0.9715283 0.9843970
> ccc
Fstat df pvalue
RESET 0.843 87803.617 0.359
Wald on restrs. 0.149 1601.024 0.699
NCV 3.164 1750200.749 0.075
SWnormal 0.002 3418932.160 0.967
lagll 0.417 578834.079 0.519
lagdd 0.865 3077053.852 0.352
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) -36.969 4.114 5.379278e+13 0.043 NA
fyll 1.587 12.995 3.069155e+04 0.000 1.882
fydd 0.249 0.657 1.164491e+05 0.417 2.329
anim -2.481 3.921 2.821465e+07 0.048 1.505-->sign
wagelabor 2.689 1.503 8.424910e+02 0.221 1.229
war 0.502 3.674 6.949300e+01 0.059 1.031-->sign
pigs 6.339 1.796 2.222197e+04 0.180 1.222
popdens -2.636 4.832 6.405527e+04 0.028 1.076--.sign
ncmallow 0.585 0.621 1.838360e+02 0.432 1.153-->delete
> aaa
0 5 6 10 15
"8" "7" "1" "4" "9"
19 20 25 30 31
"2" "9" "9" "5" "2"
33 35 38 40 44
"1" "10" "1" "12" "2"
45 50 55 56 60
"9" "9" "6" "2" "4"
63 65 67 70 75
"1" "11" "1" "5" "3"
80 85 95 100
"3" "4" "1" "1" "142"
"PctFemContAg"
> impute
[1] "number of imputations nimp=" "3"
>
>
run 5
- Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MI")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("SCCS.Rdata",.GlobalEnv)
#--Read in two weight matrices--
ll<-as.matrix(read.dta("langwm.dta")[,-1])
dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
depvar<-SCCS$v821 ###log(1+SCCS$v821) ###
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
depvarname<-"PctFemContAg"
#--can add additional SCCS variable, but only if it has no missing values---
dateobs<-SCCS$v838
dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
indpv<-c("foodscarc",###"femsubs",
"fratgrpstr","marrcaptives","plunder",###"climate",
"exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
"war","himilexp","wagelabor",
"famsize","settype","localjh","money",
"cultints","roots","cereals","gath","hunt","fish",
"anim","pigs","milk","plow","bovines","tree","foodtrade",
"ndrymonth","ecorich",
"popdens","pathstress","CVrain","rain","temp","AP1","AP2")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
xly<-ll%*%xx
cyl<-ll%*%y
o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
fyll<-fitted(o)
#--keep R2 from this regression--
lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd,fyll)
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~fyll+fydd++AP1+AP2+
fratgrpstr+marrcaptives+plunder+###climate+
cultints+roots+cereals+gath+plow+
hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
settype+localjh+superjh+moralgods+fempower+###femsubs+
sexratio+war+himilexp+money+wagelabor
,data=m9)
#--OLS estimate of restricted model--
xR<-lm(depvar~fyll+fydd+anim+
war+wagelabor+popdens
,data=m9)
# fydd+ ### fyll+ <--- ADDED fydd for distance clustering
# fratgrpstr ###climate+
# ###+gath+
# ###huntanim+++foodtrade+foodscarc+
# ###pathstress+ +exogamy
# ###+superjh+
# +femsubs ###+
# ###war+wagelabor ###sexratio+
# ,data=m9)
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
"R2:final model","R2:IV(distance)","R2:IV(language)")
r2<-apply(dng[,7:9],2,mean)
adng<-dng[,1:6]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
lon[90]=-131
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
final results
R2:final model R2:IV(distance) R2:IV(language)
0.2810274 0.9715283 0.9843970
Fstat df pvalue "PctFemContAg"
RESET 0.586 12218.688 0.444 <-no vars need log
Wald on restrs. 0.149 1601.024 0.699 <-no var missing
NCV 2.417 13632.135 0.120 <-no bunching
SWnormal 0.006 1981789.848 0.936 <-normal distrib errors
lagll 0.411 377470.146 0.522 <-no lang autocor
lagdd 0.832 1907553.379 0.362 <-no dist autocor
mnb fst v pval VIF
(Intercept) -30.645 2.942 417585.374 0.086 NA
fyll 1.399 10.982 75919.214 0.001 1.717
fydd 0.399 1.854 354805.414 0.173 2.099
anim -2.750 4.911 965958.604 0.027 1.464
war 0.446 2.968 83.698 0.089 1.010
wagelabor 2.332 1.151 1114.314 0.284 1.200 <-- drop for a final xR model
popdens -2.356 3.912 48757.127 0.048 1.052
0 5 6 10 15
"8" "7" "1" "4" "9"
19 20 25 30 31
"2" "9" "9" "5" "2"
33 35 38 40 44
"1" "10" "1" "12" "2"
45 50 55 56 60
"9" "9" "6" "2" "4"
63 65 67 70 75
"1" "11" "1" "5" "3"
80 85 95 100 "PctFemContAg"
"3" "4" "1" "1" "142"
3rd results negative with animals which makes good sense
As husbandry increase, and males do the husbandry, women's % contrib to agriculture drops, i.e., men also do more agriculture. The r2 here is quite good. Any other indep vars that might be predictive that are not already coded? Doug 15:10, 10 February 2010 (PST) I did this step for you since you were coming along nicely.
- Also tested adding plow, which has no effect.
R2:final model R2:IV(distance) R2:IV(language)
0.2391168 0.9721403 0.9850290
Fstat df pvalue
RESET 0.573 4439.743 0.449
Wald on restrs. 0.263 18727.343 0.608
NCV 2.128 1635098.385 0.145
SWnormal 0.650 89069.153 0.420
lagll 0.420 17233211.516 0.517
lagdd 0.822 10488.142 0.365
mnb fst v pval VIF
(Intercept) -28.481 3.571 346042.425 0.059 NA
fyll 1.410 11.324 54004.615 0.001 1.633
fydd 0.373 1.557 4202.249 0.212 2.097
anim -2.801 5.032 43145.334 0.025 1.427
plot language
lat<-SCCS$v833.1 lon<-SCCS$v833.2 plot(lon,lat, cex=.1) z=depvar ztxt=as.character(z) text(lon,lat,ztxt)
