EduMod86 Jajmani
EduMod85 has unrestricted models. Here the models that are further restricted are so labelled. Note that fx<- is defined only in Program 1.
- Orans, Martin. 1968, Maximizing in Jajmaniland: A Model of Caste Relations American Anthropologist 70(5): 875–897. pdf This articles has 7 variables coded for one or both of two time periods for 39 Indian villages, located geographically, and with probably calculable linguistic phylogeny. FIFTEEN ARE CODED FOR BOTH TIME PERIODS (p 891). Can we predict causality from time period 1 and predict what will happen at time 2?
- Authors using causal analysis: Ren Feng, Tolga Oztan and Douglas R. White.
edit the csv file
setwd("C:/My Documents/MIjaj")
jajman<-read.csv("jaj.csv")
save(jajman,file="jajman.Rdata")
EduMod85 - EduMod86
- Graph of causal analysis results
*Vertices 4 1 "Isolation" 0.0702 0.5000 0.5000 2 "Hierarchy" 0.3671 0.1330 0.5000 3 "Ritual-Secular Correl" 0.7035 0.5000 0.5000 4 "Pol Power Concentration" 0.3694 0.8853 0.5000 *Arcs 1 2 1 c Black l p<0.16-coef_0.315 2 3 2 c Black l p-value<0.10-coef_0.523 3 2 3 c Black l p-value<0.03-coef_0.496 3 4 2 c Red l p-value<0.06-coef_-.344 4 3 3 c Red l p-value<0.04-coef_-.683
Because of the small sample size, we regard the weaker effects of Isolation as significant. Thus these two weak effects (e.g., improvement of roads to market towns) as the drivers of changes in political power concentration and village hiearch, which interact with the ritual-secular correlation with reverberating (reciprocally causal) effects.
AS A PREDICTION FOR THE SECOND TIME PERIOD, we would expect changes in reducing isolation at that time period to affect all three other variables.
TEST OF PREDICTIONS:
- Decreased Isolation --> Decreased Hierarchy p-value = N.S. (Fisher Exact Test) p-value > .27, but this is part of the indirect path to Ritual / Secular Correlation (N=11)
- Decreased Isolation --> Decreased Ritual / Secular Correlation p-value = .055 (Fisher Exact Test, N=11)
- Decreased Isolation --> Decreased Political Power Concentration p-value = .015 (Fisher Exact Test, N=14)
- Decreased Hierarchy <-> Decreased Ritual / Secular Correlation p-value = .067 (Fisher Exact Test, N=10)
- Decreased Political Power Concentration <-> Decreased Ritual / Secular Correlation p-value = .067 (Fisher Exact Test, N=11)
Maximizing in Jajmaniland: A Model of Caste Relations
MARTIN ORANS. American Anthropologist 70(5):875-897.
Article first published online: 28 OCT 2009.
P. 880: Our model of a caste system contains the following
types of hierarchies:
(1) Political hierarchy (PH)
(2) Economic hierarchy (EH)
(3) Interactional hierarchy (IH)
(4) Attributional hierarchy (AH)
Since data on the attributional hierarchy
was too scarce it was dropped from further
consideration.
Our observables are as follows:
(1) Political power concentration (P)
(2) Stability (S)
(3) Ritual secular rank correlation
(R) = (1H)-(PH) and (1H)-(EH) fit
(4) Jajmani-market continuum (J)
(5) Consensus on rank (C)
(6) Hierarchy equality continuum (H)
(7) Isolation (I)
next step
next step EduMod86 with the new program package.
- Allows 1 or 2 W matrices
- Run restricted models
- Compute regression coefficients
- Compute total effects (adding indirect path)
- Is the reciprocal effect x-->y plus y-->x-->y as the indirect path?
- Correlate total effects to delta x <--> delta y (time lagged correlations)
Problems of moderation variables: Multiplicative interaction terms
The findings of Orans all involve multiplicative interaction terms, as discussed by Lowell Hargens. 2008. Product-Variable Models of Interaction Effects and Causal Mechanisms Working Paper no. 67R. Center for Statistics and the Social Sciences. University of Washington.
Multiplicative interaction terms are being tested for predictiveness in some of the regressions listed below in Program 1 and used for dependent variables ppc, rsc, and jr.
PREppc_i_jr=jaj$i*jaj$jr, #Orans p. 880 (1) PRErsc_ppc_jr=jaj$ppc*jaj$jr, #Orans p. 881 (3) PREjr_i_ppc=jaj$i*jaj$ppc, #Orans p. 881 (4)
South Asian (Indian) villages
The 15 cases with + are those with data on two time periods. These are used to analyze causality and make predictions from changes in transport systems about changes in the time-two variables.
1 Anigara 2 Bhadkad 3 Bhuvel 4 + Fatehpura 5 Gopalpur(Beals) 6 + Hatarahalli 7 Karimpur 8 + Kishan Garhi 9 + Kumbapettai 10 + Madhopur 11 + Oriya Hill 12 Rampura 13 Swat 14 + Terutenne 15 Tottagadde 16 Brahman Village 17 Daharpur 18 Desert Village 19 Dewara 20 Jaurasi 21 Jhabiran 22 Kabirpur 23 + Khalapur 24 Majra 25 Mohana 26 + Potlod 27 Radhvanaj 28 Ramkheri 29 Shamirpet 30 + Gaon 31 + Gopalpur (Khare) 32 + Chirruppiddi 33 Kanchanpur 34 + Kota 35 Mohla 36 Pelpala 37 + Rani Khera 38 + Sirkanda 39 W. Bengal
program 1 multiple imputation
setwd("C:/My Documents/MIjaj")
#Program 1 --> Program 2 below
#Program 1 Part A
#MI--make the imputed datasets
#--change the following path to the directory with your data and program--
setwd("C:/My Documents/MIjaj")
rm(list=ls(all=TRUE))
options(echo=TRUE)
#--you need the following two packages--you must install them first--
library(foreign)
library(mice)
library(tripack)
library(zoo)
library(sp)
library(maptools)
library(spam)
#--for program 2 below
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#--To find the citation for a package, use this function:---
citation("mice")
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in auxiliary variables---
load("vaux.Rdata",.GlobalEnv)
row.names(vaux)<-NULL
#--Read in the SCCS dataset---
load("jaj.Rdata",.GlobalEnv)
#--look at first 6 rows of vaux--
head(vaux)
#--look at field names of vaux--
names(vaux)
#--check to see that rows are properly aligned in the two datasets--
#--sum should equal 186---
sum((jaj$socinam==vaux$socinam)*1)
#--remove the society name field--
vaux<-vaux[,-1]
vaux<-vaux[,-1]
names(vaux)
#--Two nominal variables: brg and rlg----
#--brg: consolidated Burton Regions-----
#0 = (rest of world) circumpolar, South and Meso-America, west North America
#1 = Subsaharan Africa
#2 = Middle Old World
#3 = Southeast Asia, Insular Pacific, Sahul
#4 = Eastern Americas
#--rlg: Religion---
#'0 (no world religion)'
#'1 (Christianity)'
#'2 (Islam)'
#'3 (Hindu/Buddhist)'
#--check to see number of missing values in vaux,
#--whether variables are numeric,
#--and number of discrete values for each variable---
vvn<-names(vaux)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(vaux[,vvn[i]])))
numeric<-is.numeric(vaux[,vvn[i]])
numDiscrVals<-length(table(vaux[,vvn[i]]))
pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
}
row.names(pp)<-vvn
pp
#MODIFY THESE STATEMENTS FOR A NEW PROJECT
#--extract variables to be used from jaj, put in dataframe fx--
fx<-data.frame(
socinam=jaj$socinam,socID=jaj$code,
PREppc_i_jr=jaj$i*jaj$jr, #Orans p. 880 (1)
PRErsc_ppc_jr=jaj$ppc*jaj$jr, #Orans p. 881 (3)
PREjr_i_ppc=jaj$i*jaj$ppc, #Orans p. 881 (4)
ppc=jaj$ppc,
s=jaj$s,
rsc=jaj$rsc,
jr=jaj$jr,
c=jaj$c,
h=jaj$h,
i=jaj$i)
#--look at first 6 rows of fx--
head(fx)
#--check to see number of missing values--
#--also check whether numeric--
vvn<-names(fx)
pp<-NULL
for (i in 1:length(vvn)){
nmiss<-length(which(is.na(fx[,vvn[i]])))
numeric<-is.numeric(fx[,vvn[i]])
pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
}
row.names(pp)<-vvn
pp
#--identify variables with missing values--
z<-which(pp[,1]>0)
zv1<-vvn[z]
zv1
#--identify variables with non-missing values--
z<-which(pp[,1]==0)
zv2<-vvn[z]
zv2
#Program 1 Part B
#-----------------------------
#----Multiple imputation------
#-----------------------------
#--number of imputed data sets to create--
nimp<-10 #CHANGED TO TEST SPEEDUP
#--one at a time, loop through those variables with missing values--
for (i in 1:length(zv1)){
#--attach the imputand to the auxiliary data--
zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
#--in the following line, the imputation is done--
aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
###aqq<-complete(mice(zxx,maxit=100,m=nimp),action="long")
#--during first iteration of the loop, create dataframe impdat--
if (i==1){
impdat<-data.frame(aqq[,c(".id",".imp")])
}
#--the imputand is placed as a field in impdat and named--
impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
names(impdat)[NCOL(impdat)]<-zv1[i]
}
#--now the non-missing variables are attached to impdat--
gg<-NULL
for (i in 1:nimp){
gg<-rbind(gg,data.frame(fx[,zv2]))
}
impdat<-cbind(impdat,gg)
#--take a look at the top 6 and bottom 6 rows of impdat--
head(impdat)
tail(impdat)
#--impdat is saved as an R-format data file--
save(impdat,file="impdat.Rdata")
Results of Program 1
Fstat df pvalue
RESET 12.075 18774.72 0.001
Wald on restrs. 0.974 12623.75 0.324
NCV 0.114 821012.64 0.736
SWnormal 0.026 33749344.83 0.872
lagdd 0.011 289669.37 0.917
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 14.225 39.450 1719930 0.000 NA
fydd -2.639 24.306 1826119 0.000 1.155
rsc -0.321 4.773 8465619 0.029 1.096
PREppc_i_jr 0.035 2.159 6692293029 0.142 1.073
vaux ppc s r-sc jr c h 1 3 1 3.0 3.0 1 3 2 3 1 3.0 2.0 1 3 3 4 1 3.0 4.0 1 3 4 4 1 3.0 3.0 1 3 5 4 1 3.0 4.0 1 3 6 4 1 3.0 3.0 1 3 7 4 1 4.0 3.0 1 4 8 4 1 3.5 3.0 1 4 9 3 1 4.0 4.0 1 3 10 4 1 3.0 4.0 1 4 11 3 1 4.0 3.5 1 4 12 4 1 4.0 4.0 1 4 13 4 1 4.0 4.0 1 4 14 4 1 3.0 4.0 1 4 15 4 1 4.0 4.0 1 4
jaj$ppc jaj$rsc jaj$jr jaj$c jaj$h jaj$i jaj$ppc [1] 3 3 4 4 4 4 4 4 3 4 3 4 4 4 4 jaj$s [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 jaj$rsc NULL jaj$jr [1] 3.0 2.0 4.0 3.0 4.0 3.0 3.0 3.0 4.0 4.0 3.5 4.0 4.0 4.0 4.0 jaj$c [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 jaj$h [1] 3 3 3 3 3 3 4 4 3 4 4 4 4 4 4 jaj$i [1] 3.0 NA 4.0 2.0 3.5 4.0 4.0 3.5 3.0 4.0 3.5 4.0 3.0 4.0 3.5
Restricted Program for Dependent variable: PPC
setwd("C:/My Documents/MIjaj")
#Run Program 1 (Multiple Imputation) first
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MIjaj")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("jaj.Rdata",.GlobalEnv)
#--Read in two weight matrices—
load("matrix.Rdata",.GlobalEnv)
#ll<-as.matrix(matrix)
dd<-as.matrix(matrix)
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
#depvar<-SCCS$v860
depvar<-jaj$ppc
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
#depvarname<-"polygyny"
depvarname<-"ppc"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
#diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
#ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
#ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
#rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
#wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
#indpv<-c("pre_mar_sex", "money", "foodstress", "femctrldwellg", "wealthy",
indpv<-c("s","rsc","jr","c","h","i","PREppc_i_jr" )
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
#xly<-ll%*%xx
#cyl<-ll%*%y
#o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
#fyll<-fitted(o)
#--keep R2 from this regression--
#lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd)
#,fyll
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~ fydd+ #fyll +
rsc+jr+h+i+PREppc_i_jr
,data=m9) # s++c
#--OLS estimate of restricted model--
xR<-lm(depvar~ fydd+ # fyll+
rsc+PREppc_i_jr #Orans p. 880 (1)
#xR: rsc+jr+h+i
,data=m9) # +c +s
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
#qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("i","jr","h")#"s","c","h",)
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
#o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
#p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p6,p7,dr2)) #p5, ,lr2
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagdd",
"R2:final model","R2:IV(distance)") #"lagll",,"R2:IV(language)"
r2<-apply(dng[,6:7],2,mean)
adng<-dng[,1:5]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
A|Unrestricted Results for Dependent variable: PPC
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 13.83054379 26.53142267 Inf 0.00000026 NA
fydd -2.65674917 18.40084616 Inf 0.00001790 1.538
rsc -0.34427058 3.64956991 Inf 0.05608369 1.646
jr -0.00256108 0.00035881 Inf 0.98488721 1.484
h 0.05727861 0.08043588 Inf 0.77670759 2.248
i 0.22726288 2.41655021 Inf 0.12005907 1.493
> r2
??: ?????'r2'
> ccc
Fstat df pvalue
RESET 15.787 Inf 0.000
Wald on restrs. 2.153 Inf 0.142
NCV 0.448 Inf 0.503
SWnormal 0.000 Inf 0.992
lagdd 0.047 Inf 0.829
R2:final model 0.792 Inf 0.373
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 13.831 26.531 Inf 0.000 NA
fydd -2.657 18.401 Inf 0.000 1.538
rsc -0.344 3.650 Inf 0.056 1.646
jr -0.003 0.000 Inf 0.985 1.484
h 0.057 0.080 Inf 0.777 2.248
i 0.227 2.417 Inf 0.120 1.493
> aaa
3 4
"4" "11" "15" "ppc"
> impute
[1] "number of imputations nimp=" "3"
B|Results for Orans' interaction Dependent variable: PPC
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 14.22512593 39.449993 1719930 0.00000000 NA
fydd -2.63906639 24.305540 1826119 0.00000082 1.155
rsc -0.32073322 4.772614 8465619 0.02891592 1.096
PREppc_i_jr 0.03534158 2.159460 6692293029 0.14169451 1.073
> r2
0.7468
> ccc
Fstat df pvalue
RESET 12.075 1.877472e+04 0.001
Wald on restrs. 0.974 1.262375e+04 0.324
NCV 0.114 8.210126e+05 0.736
SWnormal 0.026 3.374934e+07 0.872
lagdd 0.011 2.896694e+05 0.917
R2:final model 0.747 1.520045e+18 0.387
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 14.225 39.450 1719930 0.000 NA
fydd -2.639 24.306 1826119 0.000 1.155
rsc -0.321 4.773 8465619 0.029 1.096
PREppc_i_jr 0.035 2.159 6692293029 0.142 1.073
> aaa
3 4
"4" "11" "15" "ppc"
> impute
[1] "number of imputations nimp=" "3"
>
Restricted Program for Dependent variable: RSC
setwd("C:/My Documents/MIjaj")
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MIjaj")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("jaj.Rdata",.GlobalEnv)
#--Read in two weight matrices—
load("matrix.Rdata",.GlobalEnv)
#ll<-as.matrix(matrix)
dd<-as.matrix(matrix)
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
#depvar<-SCCS$v860
depvar<-jaj$rsc
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
#depvarname<-"polygyny"
depvarname<-"rsc"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
#diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
#ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
#ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
#rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
#wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
#indpv<-c("pre_mar_sex", "money", "foodstress", "femctrldwellg", "wealthy",
indpv<-c("s","ppc","i","c","jr","h","PRErsc_ppc_jr")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
#xly<-ll%*%xx
#cyl<-ll%*%y
#o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
#fyll<-fitted(o)
#--keep R2 from this regression--
#lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd)
#,fyll
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~ fydd+ #fyll +
PRErsc_ppc_jr+ppc+i+jr+h
,data=m9) # s++c
#--OLS estimate of restricted model--
xR<-lm(depvar~ fydd+ # fyll+
PRErsc_ppc_jr+ppc+i+h #Orans p. 881 (3)
#xR: ppc+i+jr+h
,data=m9) # +c +s
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
#qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("jr")#"s","c","h",)
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
#o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
#p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p6,p7,dr2)) #p5, ,lr2
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagdd",
"R2:final model","R2:IV(distance)") #"lagll",,"R2:IV(language)"
r2<-apply(dng[,6:7],2,mean)
adng<-dng[,1:5]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
A|Unrestricted Results for Dependent variable: RSC
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 14.82646838 3.24132922 Inf 0.07180236 NA
fydd -3.26175731 2.31907310 Inf 0.12779616 2.538
ppc -0.68331500 4.49459614 Inf 0.03400214 2.188
i 0.02109887 0.00812827 Inf 0.92816251 1.861
jr 0.09743853 0.25744480 Inf 0.61188107 1.456
h 0.52319112 4.84771429 Inf 0.02768279 1.513
> r2
??: ?????'r2'
> ccc
Fstat df pvalue
RESET 1.307 Inf 0.253
Wald on restrs. 0.007 Inf 0.934
NCV 0.914 Inf 0.339
SWnormal 1.778 Inf 0.182
lagdd 0.052 Inf 0.820
R2:final model 0.635 Inf 0.426
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 14.826 3.241 Inf 0.072 NA
fydd -3.262 2.319 Inf 0.128 2.538
ppc -0.683 4.495 Inf 0.034 2.188
i 0.021 0.008 Inf 0.928 1.861
jr 0.097 0.257 Inf 0.612 1.456
h 0.523 4.848 Inf 0.028 1.513
> aaa
3 3.5 4
"8" "1" "6" "15" "rsc"
> impute
[1] "number of imputations nimp=" "3"
>
B|Restricted results for Orans' interaction Dependent variable: RSC
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 15.08104859 5.0876433 4.947409e+09 0.02409692 NA
fydd -3.26028815 3.4873587 4.041311e+09 0.06183918 1.825
PRErsc_ppc_jr 0.01738321 0.1273972 1.670190e+10 0.72114611 2.559
ppc -0.72032843 4.6092533 1.051622e+14 0.03179988 2.566
h 0.54127606 6.8630774 3.367882e+12 0.00879949 1.239
> r2
R2:final model R2:IV(distance)
0.6252065 0.9707950
> ccc
Fstat df pvalue
RESET 1.381 1.019231e+09 0.240
Wald on restrs. 1.129 1.716493e+06 0.288
NCV 0.715 8.702922e+08 0.398
SWnormal 1.489 9.411673e+06 0.222
lagdd 0.048 3.280131e+12 0.826
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 15.081 5.088 4.947409e+09 0.024 NA
fydd -3.260 3.487 4.041311e+09 0.062 1.825
PRErsc_ppc_jr 0.017 0.127 1.670190e+10 0.721 2.559
ppc -0.720 4.609 1.051622e+14 0.032 2.566
h 0.541 6.863 3.367882e+12 0.009 1.239
> aaa
3 3.5 4
"8" "1" "6" "15" "rsc"
> impute
[1] "number of imputations nimp=" "3"
>
>
Program for Dependent variable: JR
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MIjaj")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("jaj.Rdata",.GlobalEnv)
#--Read in two weight matrices—
load("matrix.Rdata",.GlobalEnv)
#ll<-as.matrix(matrix)
dd<-as.matrix(matrix)
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
#depvar<-SCCS$v860
depvar<-jaj$jr
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
#depvarname<-"polygyny"
depvarname<-"jr"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
#diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
#ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
#ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
#rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
#wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
#indpv<-c("pre_mar_sex", "money", "foodstress", "femctrldwellg", "wealthy",
indpv<-c("s","ppc","rsc","c","h","i","PREjr_i_ppc")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
#xly<-ll%*%xx
#cyl<-ll%*%y
#o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
#fyll<-fitted(o)
#--keep R2 from this regression--
#lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd)
#,fyll
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~ fydd+ #fyll +
PREjr_i_ppc+ppc+rsc+h+i
,data=m9) # s++c
#--OLS estimate of restricted model--
xR<-lm(depvar~ fydd+ # fyll+
PREjr_i_ppc+ppc+rsc+h+i #Orans p. 881 (4)
,data=m9) # +c +s
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
#qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("i")#"s","c","h",)
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
#o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
#p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p6,p7,dr2)) #p5, ,lr2
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagdd",
"R2:final model","R2:IV(distance)") #"lagll",,"R2:IV(language)"
r2<-apply(dng[,6:7],2,mean)
adng<-dng[,1:5]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
A|Unrestricted Results for Dependent variable: JR
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 9.14903875 0.37011877 Inf 0.5429395 NA
fydd -2.12487570 0.40246568 Inf 0.5258186 4.765
ppc 0.02505157 0.00091599 Inf 0.9758555 5.123
rsc 0.26373068 0.24537704 Inf 0.6203492 2.481
h -0.10795689 0.05044716 Inf 0.8222870 2.199
i 0.33586702 0.87381609 Inf 0.3499010 1.558
> r2
??: ?????'r2'
> ccc
Fstat df pvalue
RESET 1.591 Inf 0.207
Wald on restrs. 0.814 Inf 0.367
NCV 0.001 Inf 0.973
SWnormal 1.618 Inf 0.203
lagdd 0.023 Inf 0.879
R2:final model 0.358 Inf 0.550
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 9.149 0.370 Inf 0.543 NA
fydd -2.125 0.402 Inf 0.526 4.765
ppc 0.025 0.001 Inf 0.976 5.123
rsc 0.264 0.245 Inf 0.620 2.481
h -0.108 0.050 Inf 0.822 2.199
i 0.336 0.874 Inf 0.350 1.558
> aaa
2 3 3.5 4
"1" "5" "1" "8" "15" "jr"
> impute
[1] "number of imputations nimp=" "3"
>
B|Restricted Results for Orans' Interaction Dependent variable: JR ???
mnb fst v pval VIF (Intercept) 9.587 2.711 722961.7 0.100 NA fydd -1.937 1.539 688952.3 0.215 1.273 PREjr_i_ppc 0.053 0.934 23144824.0 0.334 1.273
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 10.54064428 0.46407150 6.968247e+04 0.4957295 NA
fydd -2.20570143 0.43209247 7.117583e+04 0.5109661 4.760
PREjr_i_ppc 0.08253526 0.85017788 3.096739e+06 0.3565021 2.795
ppc -0.25188458 0.06657835 1.384781e+05 0.7963858 7.095
rsc 0.25405278 0.22573567 3.812591e+05 0.6347042 2.494
h -0.09471073 0.03924907 9.062682e+12 0.8429560 2.170
> r2
R2:final model R2:IV(distance)
0.3564793 0.9909527
> ccc
Fstat df pvalue
RESET 1.730 1.481105e+09 0.188
Wald on restrs. 0.094 4.202000e+00 0.773
NCV 0.001 4.964051e+07 0.980
SWnormal 1.580 9.784276e+03 0.209
lagdd 0.020 1.788264e+07 0.888
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 10.541 0.464 6.968248e+04 0.496 NA
fydd -2.206 0.432 7.117583e+04 0.511 4.760
PREjr_i_ppc 0.083 0.850 3.096739e+06 0.357 2.795
ppc -0.252 0.067 1.384781e+05 0.796 7.095
rsc 0.254 0.226 3.812591e+05 0.635 2.494
h -0.095 0.039 9.062682e+12 0.843 2.170
> aaa
2 3 3.5 4
"1" "5" "1" "8" "15" "jr"
> impute
[1] "number of imputations nimp=" "3"
>
>
Restricted Program for Dependent variable: H
setwd("C:/My Documents/MIjaj")
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MIjaj")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("jaj.Rdata",.GlobalEnv)
#--Read in two weight matrices—
load("matrix.Rdata",.GlobalEnv)
#ll<-as.matrix(matrix)
dd<-as.matrix(matrix)
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
#depvar<-SCCS$v860
depvar<-jaj$h
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
#depvarname<-"polygyny"
depvarname<-"h"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
#diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
#ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
#ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
#rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
#wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
#indpv<-c("pre_mar_sex", "money", "foodstress", "femctrldwellg", "wealthy",
indpv<-c("s","ppc","rsc","c","jr","i")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
#xly<-ll%*%xx
#cyl<-ll%*%y
#o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
#fyll<-fitted(o)
#--keep R2 from this regression--
#lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd)
#,fyll
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~ fydd+ #fyll +
ppc+rsc+jr+i
,data=m9) # s++c
#--OLS estimate of restricted model--
xR<-lm(depvar~ fydd+ # fyll+
rsc+i
#xR: ppc+rsc+jr+i
,data=m9) # +c +s
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
#qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("i")#"s","c","h",)
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
#o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
#p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p6,p7,dr2)) #p5, ,lr2
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagdd",
"R2:final model","R2:IV(distance)") #"lagll",,"R2:IV(language)"
r2<-apply(dng[,6:7],2,mean)
adng<-dng[,1:5]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
Results for Dependent variable: H
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 7.6512780 2.779751 3429.773 0.09555431 NA
fydd -1.9278563 2.607508 2170.964 0.10650413 1.113
rsc 0.4836723 5.186153 77016.186 0.02277029 1.095
i 0.2985883 2.924874 96481.639 0.08722723 1.020
> r2
R2:final model R2:IV(distance)
0.5842396 0.9856866
> ccc
Fstat df pvalue
RESET 0.967 7.477032e+05 0.325
Wald on restrs. 1.776 2.741738e+05 0.183
NCV 0.086 9.287240e+02 0.770
SWnormal 0.188 1.332419e+05 0.665
lagdd 0.019 5.985799e+09 0.889
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 7.651 2.780 3429.773 0.096 NA
fydd -1.928 2.608 2170.964 0.107 1.113
rsc 0.484 5.186 77016.186 0.023 1.095
i 0.299 2.925 96481.639 0.087 1.020
> aaa
3 4
"7" "8" "15" "h"
> impute
[1] "number of imputations nimp=" "3"
>
Program for Dependent variable: I
#Program 2 (12-18-2009)==
#MI--estimate model, combine results
rm(list=ls(all=TRUE))
#--Set path to your directory with data and program--
setwd("C:/My Documents/MIjaj")
options(echo=TRUE)
#--need these packages for estimation and diagnostics--
library(foreign)
library(spdep)
library(car)
library(lmtest)
library(sandwich)
#-----------------------------
#--Read in data, rearrange----
#-----------------------------
#--Read in original SCCS data---
load("jaj.Rdata",.GlobalEnv)
#--Read in two weight matrices—
load("matrix.Rdata",.GlobalEnv)
#ll<-as.matrix(matrix)
dd<-as.matrix(matrix)
#--Read in the imputed dataset---
load("impdat.Rdata",.GlobalEnv)
#--create dep.varb. you wish to use from SCCS data--
#--Here we sum variables measuring how much a society values children--
#depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
#depvar<-SCCS$v860
depvar<-jaj$i
#--find obs. for which dep. varb. is non-missing--
zdv<-which(!is.na(depvar))
depvar<-depvar[zdv]
#HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
#depvarname<-"polygyny"
depvarname<-"i"
#--can add additional SCCS variable, but only if it has no missing values---
#dateobs<-SCCS$v838
#dateobs<-dateobs[zdv]
#--look at histogram and frequencies for the dep. varb.--
hist(depvar)
table(depvar)
#--modify weight matrices---
#--set diagonal equal to zeros--
#diag(ll)<-0
diag(dd)<-0
#--use only obs. where dep. varb. non-missing--
#ll<-ll[zdv,zdv]
dd<-dd[zdv,zdv]
#--row standardize (rows sum to one)
#ll<-ll/rowSums(ll)
dd<-dd/rowSums(dd)
#--check to see that rows sum to one
#rowSums(ll)
rowSums(dd)
#--make weight matrix object for later autocorrelation test--
#wmatll<-mat2listw(as.matrix(ll))
wmatdd<-mat2listw(as.matrix(dd))
#indpv<-c("pre_mar_sex", "money", "foodstress", "femctrldwellg", "wealthy",
indpv<-c("s","ppc","rsc","c","jr","h")
#-----------------------------------------------------
#---Estimate model on each imputed dataset------------
#-----------------------------------------------------
#--number of imputed datasets--
#nimp<-10
nimp<-3 ################################################################## speedup test
#--will append values to these empty objects--
vif<-NULL
ss<-NULL
beta<-NULL
dng<-NULL
#--loop through the imputed datasets--
for (i in 1:nimp){
#--select the ith imputed dataset--
m9<-impdat[which(impdat$.imp==i),]
#--retain only obs. for which dep. varb. is nonmissing--
m9<-m9[zdv,]
#--create spatially lagged dep. varbs.--
y<-as.matrix(depvar)
xx<-as.matrix(m9[,indpv])
#--for instruments we use the spatial lag of our indep. varbs.--
#--First, the spatially lagged varb. for distance--
xdy<-dd%*%xx
cyd<-dd%*%y
o<-lm(cyd~xdy)
#--the fitted value is our instrumental variable--
fydd<-fitted(o)
#--keep R2 from this regression--
dr2<-summary(o)$r.squared
#--Then, the spatially lagged varb. for language--
#xly<-ll%*%xx
#cyl<-ll%*%y
#o<-lm(cyl~xly)
#--the fitted value is our instrumental variable--
#fyll<-fitted(o)
#--keep R2 from this regression--
#lr2<-summary(o)$r.squared
m9<-cbind(m9,fydd)
#,fyll
#--OLS estimate of unrestricted model--
xUR<-lm(depvar~ fydd+ #fyll +
ppc+rsc+jr+h
,data=m9) # s++c
#--OLS estimate of restricted model--
xR<-lm(depvar~ fydd+ # fyll+
ppc+rsc+jr+h
,data=m9) # +c +s
#--corrected sigma2 and R2 for 2SLS--
qxx<-m9
qxx[,"fydd"]<-cyd
#qxx[,"fyll"]<-cyl
b<-coef(xR)
incpt<-matrix(1,NROW(qxx),1)
x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
e<-y-x%*%as.matrix(b)
cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
#--collect coefficients and their variances--
ov<-summary(xR)
vif<-rbind(vif,vif(xR))
ss<-rbind(ss,diag(ov$cov*cs2))
#--collect robust coef. variances when there is heteroskedasticity--
#eb<-e^2
#x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
#hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
#ss<-rbind(ss,diag(hcm))
beta<-rbind(beta,coef(xR))
#--collect some model diagnostics--
dropt<-c("h")#"s","c","h",)
#--Ramsey RESET test--
p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
#--Wald test (H0: dropped variables have coefficient equal zero)--
o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
#--Heteroskedasticity test (H0: homoskedastic residuals)--
p3<-ncv.test(xR)$ChiSquare
#--Shapiro-Wilke normality test (H0: residuals normal)
p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
#--LaGrange Multiplier test for spatial autocorrelation: language--
#o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
#p5<-as.numeric(o$LMlag$statistic)
#--LaGrange Multiplier test for spatial autocorrelation: distance--
o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
p6<-as.numeric(o$LMlag$statistic)
#--model R2--
p7<-cr2
dng<-rbind(dng,cbind(p1,p2,p3,p4,p6,p7,dr2)) #p5, ,lr2
}
#--------------------------------------------
#--Rubin's formulas for combining estimates--
#--------------------------------------------
#--first find final regr. coefs. and p-values--
mnb<-apply(beta,2,mean)
####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
mnv<-apply(ss,2,mean)
vrT<-mnv+vrb*(1-nimp^(-1))
fst<-mnb^2/vrT
r<-(1+nimp^(-1))*vrb/mnv
v<-(nimp-1)*(1+r^(-1))^2
pval<-pf(fst,1,v,lower.tail=FALSE)
bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
#--Then combine the diagnostics we collected--
dng<-data.frame(dng)
names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagdd",
"R2:final model","R2:IV(distance)") #"lagll",,"R2:IV(language)"
r2<-apply(dng[,6:7],2,mean)
adng<-dng[,1:5]
mdm<-apply(adng,2,mean)
vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
aa<-4*mdm^2-2*vrd
aa[which(aa<0)]<-0
rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
vd<-(nimp-1)*(1+rd^(-1))^2
Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
#-All chi-sq we collected have df=1-------
pvald<-pf(Dm,1,vd,lower.tail=FALSE)
ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
names(ccc)<-c("Fstat","df","pvalue")
#--write results to csv file for perusal in spreadsheet--
write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
write.csv(bbb,file="OLSresults.csv",append=TRUE)
write.csv(r2,file="OLSresults.csv",append=TRUE)
write.csv(ccc,file="OLSresults.csv",append=TRUE)
aaa<-c(table(depvar), NROW(depvar),depvarname)
imp<-"number of imputations nimp="
impute=c(imp,nimp)
bbb
r2
ccc
bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
bbb
aaa
impute
Results for Dependent variable: I
> bbb
coef Fstat ddf pvalue VIF
(Intercept) 18.44740248 1.18092644 Inf 0.2771675 NA
fydd -5.18888766 1.05057652 Inf 0.3053744 1.609
ppc 0.43545365 0.75405308 Inf 0.3851960 1.882
rsc -0.00931386 0.00034735 Inf 0.9851304 2.563
jr 0.18588874 0.30614896 Inf 0.5800530 1.107
h 0.30645122 0.46520916 Inf 0.4951995 2.197
> r2
R2:final model R2:IV(distance)
0.4400281 0.8185609
> ccc
Fstat df pvalue
RESET 0.115 Inf 0.735
Wald on restrs. 0.392 Inf 0.531
NCV 1.686 Inf 0.194
SWnormal 1.720 Inf 0.190
lagdd 0.027 Inf 0.871
> bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
> bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
> bbb
mnb fst v pval VIF
(Intercept) 18.447 1.181 Inf 0.277 NA
fydd -5.189 1.051 Inf 0.305 1.609
ppc 0.435 0.754 Inf 0.385 1.882
rsc -0.009 0.000 Inf 0.985 2.563
jr 0.186 0.306 Inf 0.580 1.107
h 0.306 0.465 Inf 0.495 2.197
> aaa
2 3 3.5 4
"1" "3" "4" "6" "14" "i"
> impute
[1] "number of imputations nimp=" "3"
>
Ren Feng
Sept 17, 2010
Hi doug:
Finally we made the program work. As so far, we got the coefficients for every variables and diagnostics except r square. In the program, we disabled all the commands related to language matrix. The program can't run before mainly due to the presence of the two 0 variance variables, "s" and "c" (I forget the full name of them). I have posted all the programs and results on the new wiki page: edumod 85. Please check them out.
