User:Jasiellt

From InterSciWiki
Revision as of 08:36, 21 March 2010 by Douglas R. White (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ANTH174_10 your depvar<-SCCS$v57 is

Contents

Intro

Contribution by Women to Agriculture

821.  Percent Female Contribution to Agriculture 
      . = Missing Data
      0 =  0%
      1 = 10%
      2 = 20%
      3 = 30%
      4 = 40%
      5 = 50%
      6 = 60%
      7 = 70%
      8 = 80%
      9 = 90%
     10 =100%

As in:

  • Ember R. Carol.American Anthropologist 1983 http://www.jstor.org/stable/676314
  • Dwight B. Heath. 1958 http://www.jstor.org/stable/2573784?cookieSet=1
  • Herbert Barry,III Brian L. Yoder 2002 http://ccr.sagepub.com/cgi/reprint/36/3/286
  • Herbert Barry, III Brian L. Yoder 2002 http://ccr.sagepub.com/cgi/content/abstract/36/3/286 Multiple Predictors of Contribution by Women to Agriculture. Cross-Cultural Research, Vol. 36, No. 3, 286-297 (2002). Doug 13:54, 6 February 2010 (PST)Should be an interesting comparison as these might not be significant in 2SLS. QUOTE P. 289-290:
  • The following six predictors were selected and eachdivided into two categories. For each predictor, the adjusted correlation with high contribution by women to agriculture is positive for the first category and negative for the second category.
  • 1. Residence is not patrilocal after marriage (Murdock & Wilson, 1972). The 35 societies in the first category, not patrilocal, combine 22 with matrilocal, 5 with avunculocal, 4 with neolocal, and 4 with ambilocal residence. The second category is patrilocal residence for 73 societies.
  • 2. General polygyny is the prevalent form of marriage in 38 societies (Murdock & Wilson, 1972). The second category, for 70 societies, combines 49 with limited polygyny and 21 with monogamy. None of the societies has polyandrous marriage.
  • 3. No written language is used (Murdock & Provost, 1973b). The 63 societies with no written language combine 39 where written records are absent and 24 with simple tallies. The second category, for 45 societies, combines 11 with nonwritten records such as picture writing, 7 with sparse accumulation of indigenous written records or long use of alien writing, and 27 with indigenous written records.
  • 4. Use of money is indigenous (Murdock & Provost, 1973b). The 43 societies with indigenous use of money combine 23 with metal coins or paper currency and 20 with articles of token value, such as

cowrie shells or wampum. The second category, for 65 societies, combines 25 with long use of currency of an alien people, 8 with usable articles such as salt, and 32 with exchange exclusively by barter.

  • 5. Population is sparse instead of dense (Murdock & Provost, 1973b). The 44 societies with sparse population combine 11 with fewer than 1 person per square mile, 10 with 5 people or fewer per square

mile, and 23 with 25 people or fewer per square mile. The second category, for 64 societies, combines 29 with 100 people or fewer per square mile and 35 with more than 100 people per square mile.

  • 6. No milk is obtained from domestic animals in 88 societies (Murdock & Morrow, 1970). The second category contains 20 societies that obtain milk from domestic animals.

TABLE 1 Unadjusted and Adjusted Correlations for Each of Six Cultural Customs With Contribution by Women to Agriculture

  1. Cultural Custom Unadjusted
  2. Correlation Adjusted Correlation
Not patrilocal .37*** .41***
Polygynous marriage .37*** .38***
No written language .49*** .35***
Indigenous money .04 .34***
Sparse population .31** .26**
No milk obtained .40*** .22*
Multiple correlation .73
Adjusted proportion .50***
NOTE: Correlations shown are for the world sample of 108 agricultural societies.

The adjusted correlation is the partial correlation of each custom with contribution by women to agriculture, adjusting for the effects of the other five customs. Summary values are the multiple correlation and the adjusted proportion of the variance accounted for by the multiple correlation. Probability values are for the difference from zero correlation.

  • p < .05. **p < .01. ***p < .001.


CODE

3. AGRICULTURE- CONTRIBUTION TO LOCAL FOOD SUPPLY

    35    1 = None
     3    2 = Non-food Crops
    17    3 = < 10%
    12    4 = < 50%, and less than any other single source, incl. trade
    42    5 = < 50%, and more than any other single source, incl. trade
    77    6 = Primarily agricultural

Day 1 (january 5th)

ctrl-a will allow you to select all contents found within the page.

Double-square brackets [[]] can be used for referencing something within this Wikipedia website

For indexing purposes use two equal signs: ==heading== make sure you have your heading in the middle of the two-brackets


Day 2 (january 7th)

cntrl-f allows search box to be in use. On website: first map is pre-conquest. second map is post-conquest when looking at war domestic animals can be a variable

xVR never change this drop-> except when there is an indep variable ---depvar add-> indep ver alter your xR is done

Correlation does NOT mean causation ex. mud -> rain Particularly with humans, what happens in one society easily affects others 95% American Indian population died because of new germs introduced from other areas of the world

for homework:find the dependent variables. Look at resources, map it out (GIS mapping)and go to codebook to find variable.

Day 3 (january 12)

textpad - adding a space and then copying results from wiki -allows me to see everything in a clear way and not all cramped together.

this talk explains how these benefits can contribute to a proposed open eRepository for anthropological knowledge! talk will have 5 parts.

1. Benefits: to analyze
  • A. solving Galton's problem with 2 stage least squares ( this is the "network problem" societies are related by historical historical network processes by which they influence one another.
  • B. Casualty C. MI D. GIS
2. benefits: to data
3. Benefits: findings and models
4. Benefits: findings of networks
5. benefits: galtons problem of nonindependence has been solved using instrumental variables for language and spatial clustering.

Instrumantal variables

  • Regression models are not well estimated when their error terms are not independent but contain the residusls of unspecified variables or systematic networl interactions.

Instrumental variables(IV) are observed variables in structural models(SEMs), other than the cause or treatment of interest, that can play an instrumantal role in identifying and estimating casual effects.

they require 2 stages OF OLS analysis 2SLS

notes for Feb.2

Outline notes exploring your topic readings, G scholer Maps - 60 "extra codes" to do/ or not -topic choice - readings on the topic( ex 1 and ex 2) - depvar and name must match - questions in class - avoid # sign in

Main topic: from corelations to causality Anthro's error: 1889 tylor- galton all disciplines suffer problem of determionistic theory in discipine + policy - challenger disaster ( ITS BEEN VERY COLD AND IF ITR WAS GNNA HARM THE EQUIPMENT, so they did a deterministic analysis, loses our governments tax money, just money thrown away) they were look ing for a regularity thats what they did wrong. only 3 people new that enrod was a disaster and was going to collapse! the whole system collapsing probabilistic thinking

2SLS

*the IVariables in 2SLS  tend to reduce significance when these are operative.
*W for matrices.
*multiply each row by y gives ur dependent variable! it is influenced byt the "background similarity" of a common linguistics.
*Open textpad, and space, do recopy sequence But this one you edumod page
  bbb
                     coef       Fstat          ddf     pvalue   VIF
(Intercept)    0.62582705  0.14144757 1.209601e+01 0.71335960    NA
fydd           0.67771647 11.84978111 2.263947e+01 0.00225505 2.317
pre_mar_sex    0.01874176  0.04552059 1.062731e+01 0.83508732 1.378
money          0.05520660  0.51828362 1.663585e+02 0.47258499 2.231
foodstress     0.06493711  0.23932771 2.442313e+01 0.62905693 1.454
femctrldwellg -0.14597995  1.44060574 4.661852e+00 0.28745733 1.607
wealthy        0.00418724  0.00100960 1.347243e+01 0.97511810 1.651
fratgrpstr     0.09011682  0.49066176 5.308538e+00 0.51314053 3.029
marrcaptives   0.28228275  4.80125914 9.560179e+00 0.05443314 1.782
plunder       -0.63349068  7.67922912 1.289722e+01 0.01598154 1.559
cultints      -0.00135031  0.00017242 1.521656e+03 0.98952517 5.159
roots          0.45780807  1.08082787 2.774757e+03 0.29860301 5.131
cereals        0.29738312  0.49166316 5.388559e+03 0.48321662 7.433
gath           0.09304185  0.83328532 2.484151e+01 0.37009901 3.226
plow          -0.53825557  1.99462116 8.583336e+01 0.16147400 3.274
hunt           0.12978343  1.59476698 2.976053e+03 0.20674532 5.214
fish          -0.01792164  0.05433460 1.863767e+02 0.81594124 3.091
anim           0.14029967  2.56869429 1.026653e+05 0.10900074 5.781
pigs          -0.53273657  2.58995167 1.290100e+02 0.10998795 2.333
milk          -0.78923686  4.42849239 7.040738e+01 0.03891527 4.554
bovines        0.30426680  0.77904007 4.042261e+02 0.37795946 4.454
tree           0.51246025  0.82815823 3.078110e+02 0.36351768 3.332
foodtrade     -0.01644826  2.94644139 1.591187e+02 0.08801288 1.525
foodscarc     -0.05109507  0.39073075 1.272545e+01 0.54296238 1.438
ecorich       -0.05468527  0.40071359 4.294367e+03 0.52675467 1.954
popdens        0.14071013  2.00011355 1.443825e+02 0.15944020 3.770
pathstress     0.03580103  1.10392267 6.572938e+01 0.29725451 2.503
exogamy        0.03398801  0.17983079 1.629889e+02 0.67207786 1.432
ncmallow      -0.01019023  0.06026975 1.740922e+01 0.80894084 1.511
famsize       -0.05764492  0.47947465 6.563276e+01 0.49110312 1.599
settype       -0.00717448  0.01081782 6.265193e+01 0.91749463 4.155
localjh        0.00205560  0.00014427 4.109672e+02 0.99042257 1.706
superjh       -0.10215123  0.95842870 2.390200e+02 0.32857292 2.713
moralgods      0.01874915  0.02836560 7.831533e+00 0.87052037 1.929
fempower      -0.08698682  2.42768469 1.833743e+01 0.13630000 1.428
femsubs        0.12607532  3.45693030 2.778303e+04 0.06299771 1.758
sexratio      -0.14286337  0.53003480 4.857335e+00 0.50015572 1.421
war            0.00615811  0.12327135 8.194928e+00 0.73436904 1.621
himilexp       0.25267635  1.56287499 5.900076e+02 0.21174037 1.612
wagelabor      0.03886651  0.11489348 7.404447e+01 0.73559972 1.499
>  r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.5457357       0.9860286       0.9812755 
>  ccc
                Fstat        df pvalue
RESET           5.164   104.814  0.025
Wald on restrs. 2.295    24.305  0.143
NCV             0.131  3190.844  0.717
SWnormal        0.146   138.033  0.703
lagll           2.859  5640.802  0.091
lagdd           5.023 14575.154  0.025
>  bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>  bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>  bbb
                 mnb    fst          v  pval   VIF
(Intercept)    0.626  0.141     12.096 0.713    NA
fydd           0.678 11.850     22.639 0.002 2.317
pre_mar_sex    0.019  0.046     10.627 0.835 1.378
money          0.055  0.518    166.359 0.473 2.231
foodstress     0.065  0.239     24.423 0.629 1.454
femctrldwellg -0.146  1.441      4.662 0.287 1.607
wealthy        0.004  0.001     13.472 0.975 1.651
fratgrpstr     0.090  0.491      5.309 0.513 3.029
marrcaptives   0.282  4.801      9.560 0.054 1.782
plunder       -0.633  7.679     12.897 0.016 1.559
cultints      -0.001  0.000   1521.656 0.990 5.159
roots          0.458  1.081   2774.757 0.299 5.131
cereals        0.297  0.492   5388.559 0.483 7.433
gath           0.093  0.833     24.842 0.370 3.226
plow          -0.538  1.995     85.833 0.161 3.274
hunt           0.130  1.595   2976.053 0.207 5.214
fish          -0.018  0.054    186.377 0.816 3.091
anim           0.140  2.569 102665.273 0.109 5.781
pigs          -0.533  2.590    129.010 0.110 2.333
milk          -0.789  4.428     70.407 0.039 4.554
bovines        0.304  0.779    404.226 0.378 4.454
tree           0.512  0.828    307.811 0.364 3.332
foodtrade     -0.016  2.946    159.119 0.088 1.525
foodscarc     -0.051  0.391     12.725 0.543 1.438
ecorich       -0.055  0.401   4294.367 0.527 1.954
popdens        0.141  2.000    144.382 0.159 3.770
pathstress     0.036  1.104     65.729 0.297 2.503
exogamy        0.034  0.180    162.989 0.672 1.432
ncmallow      -0.010  0.060     17.409 0.809 1.511
famsize       -0.058  0.479     65.633 0.491 1.599
settype       -0.007  0.011     62.652 0.917 4.155
localjh        0.002  0.000    410.967 0.990 1.706
superjh       -0.102  0.958    239.020 0.329 2.713
moralgods      0.019  0.028      7.832 0.871 1.929
fempower      -0.087  2.428     18.337 0.136 1.428
femsubs        0.126  3.457  27783.028 0.063 1.758
sexratio      -0.143  0.530      4.857 0.500 1.421
war            0.006  0.123      8.195 0.734 1.621
himilexp       0.253  1.563    590.008 0.212 1.612
wagelabor      0.039  0.115     74.044 0.736 1.499
>  aaa
         1          2          3          4          5                       
      "27"       "32"       "45"       "34"       "46"      "184" "polygyny" 
>  impute



PROGRAM for agricultural unrestricted model

 #Program 1 --> Program 2 below
 #Program 1 Part A
 #MI--make the imputed datasets
 #--change the following path to the directory with your data and program--
 setwd("C:/My Documents/MI")
 rm(list=ls(all=TRUE))
 options(echo=TRUE)
 #--you need the following two packages--you must install them first--
 library(foreign)
 library(mice)
 library(tripak)
 library(zoo)
 library(sp)
 library(maptools)
 library(spam))
 #--for program 2 below
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #--To find the citation for a package, use this function:---
 citation("mice")
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in auxiliary variables---
 load("vaux.Rdata",.GlobalEnv)
 row.names(vaux)<-NULL
 #--Read in the SCCS dataset---
 load("SCCS.Rdata",.GlobalEnv)
 
 #--look at first 6 rows of vaux--
 head(vaux)
 #--look at field names of vaux--
 names(vaux)
 #--check to see that rows are properly aligned in the two datasets--
 #--sum should equal 186---
 sum((SCCS$socname==vaux$socname)*1)
 #--remove the society name field--
 vaux<-vaux[,-28]
 names(vaux)
 
 #--Two nominal variables: brg and rlg----
 #--brg: consolidated Burton  Regions-----
 #0 = (rest of world) circumpolar, South and Meso-America, west North America
 #1 = Subsaharan Africa
 #2 = Middle Old World
 #3 = Southeast Asia, Insular Pacific, Sahul
 #4 = Eastern Americas
 #--rlg: Religion---
 #'0 (no world religion)'  
 #'1 (Christianity)'  
 #'2 (Islam)'  
 #'3 (Hindu/Buddhist)'  
 
 #--check to see number of missing values in vaux, 
 #--whether variables are numeric,
 #--and number of discrete values for each variable---
 vvn<-names(vaux)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(vaux[,vvn[i]])))
 numeric<-is.numeric(vaux[,vvn[i]])
 numDiscrVals<-length(table(vaux[,vvn[i]]))
 pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
 }
 row.names(pp)<-vvn
 pp
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--extract variables to be used from SCCS, put in dataframe fx--
 fx<-data.frame(
 socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=log(1+SCCS$v821), 
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
 valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
 fratgrpstr=SCCS$v570,
 marrcaptives=SCCS$v870,
 plunder=SCCS$v912, 
 ###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
 nonmatrel=SCCS$v52,
 lrgfam=SCCS$v68,
 exogamy=SCCS$v72,
 famsize=SCCS$v80,
 money=SCCS$v155,
 popdens=SCCS$v156,
 malesexag=SCCS$v175,
 ndrymonth=SCCS$v196,
 gath=SCCS$v203,
 hunt=SCCS$v204,
 fish=SCCS$v205,
 anim=SCCS$v206,
 brideprice=(SCCS$v208==1)*1,
 nuclearfam=(SCCS$v210<=3)*1,
 ncmallow=SCCS$v227,
 cultints=SCCS$v232,
 tree=(SCCS$v233==4)*1,
 roots=(SCCS$v233==5)*1,
 cereals=(SCCS$v233==6)*1,
 settype=SCCS$v234,
 localjh=(SCCS$v236-1),
 superjh=SCCS$v237,
 moralgods=SCCS$v238,
 segadlboys=SCCS$v242,
 plow=(SCCS$v243>1)*1,
 pigs=(SCCS$v244==2)*1,
 bovines=(SCCS$v244==7)*1,
 milk=(SCCS$v245>1)*1,
 agrlateboy=SCCS$v300,
 fempower=SCCS$v663,
 migr=(SCCS$v677==2)*1,
 foodtrade=SCCS$v819,
 dateobs=SCCS$v838,
 rain=SCCS$v854,
 temp=SCCS$v855,
 ecorich=SCCS$v857,
 #pctFemPolyg=SCCS$v872,
 femsubs=SCCS$v890,
 himilexp=(SCCS$v899==1)*1,
 AP1=SCCS$v921,
 AP2=SCCS$v928,
 pathstress=SCCS$v1260,
 war=SCCS$v1648,
 foodscarc=SCCS$v1685,
 sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
 wagelabor=SCCS$v1732,
 CVrain=SCCS$v1914/SCCS$v1913
 ) 
 #--look at first 6 rows of fx--
 head(fx)
 
 #--check to see number of missing values--
 #--also check whether numeric--
 vvn<-names(fx)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(fx[,vvn[i]])))
 numeric<-is.numeric(fx[,vvn[i]])
 pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
 }
 row.names(pp)<-vvn
 pp
 
 #--identify variables with missing values--
 z<-which(pp[,1]>0)
 zv1<-vvn[z]
 zv1
 #--identify variables with non-missing values--
 z<-which(pp[,1]==0)
 zv2<-vvn[z]
 zv2
   
 
 
 
 
 
 
 #Program 1 Part B
 #-----------------------------
 #----Multiple imputation------
 #-----------------------------
 
 #--number of imputed data sets to create--
 #nimp<-10 #CHANGED TO TEST SPEEDUP  
 nimp<-3  ################################################################## speedup test
 #--one at a time, loop through those variables with missing values--
 for (i in 1:length(zv1)){
 #--attach the imputand to the auxiliary data--
 zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
 #--in the following line, the imputation is done--
 aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
 #--during first iteration of the loop, create dataframe impdat--
 if (i==1){
 impdat<-data.frame(aqq[,c(".id",".imp")])
 }
 #--the imputand is placed as a field in impdat and named--
 impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
 names(impdat)[NCOL(impdat)]<-zv1[i]
 }
 
 #--now the non-missing variables are attached to impdat--
 gg<-NULL
 for (i in 1:nimp){
 gg<-rbind(gg,data.frame(fx[,zv2]))
 }
 impdat<-cbind(impdat,gg)
 
 #--take a look at the top 6 and bottom 6 rows of impdat--
 head(impdat)
 tail(impdat)
 
 #--impdat is saved as an R-format data file--
 save(impdat,file="impdat.Rdata")
 
 #Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-log(1+SCCS$v821)
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("femsubs","foodscarc",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 
 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr+marrcaptives+plunder+   ###climate+fish+
 #  plow+                              ###cereals+gath+
 #  ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 #  ###pathstress+                             ecorich+popdens+exogamy+ncmallow+famsize+
 #  ###settype+localjh+superjh+moralgods+
 #  fempower+femsubs                   ###+
 #  ###war+himilexp  +money+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("cereals","gath","plow","hunt","anim",
 "pigs","milk","bovines","foodscarc","ecorich",
 "popdens","pathstress","ncmallow","famsize","localjh",
 "superjh","moralgods","fempower","sexratio","money",
 "wagelabor","war","himilexp","tree")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute
 hist(log(cyl/y))

results

 r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.6457590       0.9817316       0.9833483 
>   ccc
                 Fstat        df pvalue
RESET           19.176   128.159  0.000
Wald on restrs.  2.877    57.587  0.095
NCV             23.280    22.166  0.000
SWnormal        16.474    25.890  0.000
lagll            2.367 44059.470  0.124
lagdd            3.225  7025.631  0.073
>   bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>   bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>   bbb
                mnb    fst            v  pval    VIF
(Intercept)  -1.476  0.437      531.357 0.509     NA
fyll          1.071  4.031     1086.194 0.045  2.938
fydd         -0.282  0.627      706.426 0.429  4.443
AP1          -0.020  0.321      927.642 0.571  2.168
AP2           0.041  0.437     9915.745 0.509  1.708
fratgrpstr   -0.024  0.079       24.389 0.782  2.773-->delte
marrcaptives -0.070  0.491      119.974 0.485  2.264
plunder      -0.196  1.230      317.362 0.268  1.894
cultints      0.163  4.544     1336.274 0.033  2.757-->sign
roots        -0.594  1.182      406.992 0.278 14.492
cereals      -0.309  0.345     3162.740 0.557 16.870
gath          0.080  0.711      289.046 0.400  2.399
plow         -0.147  0.246     3494.554 0.620  4.055
hunt          0.155  2.878     2966.755 0.090  3.906-->sign
fish          0.083  1.491   278216.956 0.222  2.410
anim         -0.135  2.920      220.205 0.089  4.408--.sign
pigs          0.079  0.096      333.018 0.757  2.640
milk         -0.375  1.917      711.290 0.167  4.215
bovines       0.496  3.254      361.576 0.072  4.717-->sign
tree         -0.271  0.193     2637.192 0.661  7.594
foodtrade    -0.007  0.651      164.858 0.421  1.774
foodscarc    -0.010  0.032       39.113 0.859  1.348-->delete
ecorich       0.052  0.540   408526.751 0.462  1.808
popdens      -0.103  1.861     2563.314 0.173  3.283
pathstress    0.007  0.086     1552.587 0.769  1.862-->delete
exogamy       0.083  1.587      169.793 0.210  1.569
ncmallow     -0.027  0.666       30.433 0.421  1.622
famsize       0.059  0.838     2424.650 0.360  1.611
settype       0.075  1.792 90457254.210 0.181  2.980
localjh      -0.047  0.121    49481.470 0.728  1.776
superjh       0.008  0.010     1960.800 0.921  2.890--.delete
moralgods     0.050  0.366    24012.380 0.545  2.462
fempower     -0.024  0.275       14.146 0.608  1.433
femsubs       0.457 62.526      631.620 0.000  1.900
sexratio      0.027  0.038       18.496 0.848  1.436-->delete
war          -0.012  1.059     6685.954 0.304  1.760
himilexp      0.113  0.444      110.975 0.507  1.595
money        -0.029  0.204     2183.696 0.652  2.415
wagelabor     0.090  0.951     6512.520 0.330  1.743
>   aaa
               0 1.79175946922805 1.94591014905531 2.39789527279837 
             "8"              "7"              "1"              "4" 
2.77258872223978 2.99573227355399 3.04452243772342 3.25809653802148 
             "9"              "2"              "9"              "9" 
3.43398720448515 3.46573590279973 3.52636052461616 3.58351893845611 
             "5"              "2"              "1"             "10" 
3.66356164612965 3.71357206670431 3.80666248977032 3.82864139648910 
             "1"             "12"              "2"              "9" 
3.93182563272433 4.02535169073515 4.04305126783455 4.11087386417331 
             "9"              "6"              "2"              "4" 
4.15888308335967 4.18965474202643 4.21950770517611 4.26267987704132 
             "1"             "11"              "1"              "5" 
4.33073334028633 4.39444915467244 4.45434729625351 4.56434819146784 
             "3"              "3"              "4"              "1" 
4.61512051684126                                   
             "1"            "142"   "PctFemContAg" 
>   impute
[1] "number of imputations nimp=" "3"                          
>   hist(log(cyl/y))
> 





program

  1. Program 1 --> Program 2 below
 #Program 1 Part A
 #MI--make the imputed datasets
 #--change the following path to the directory with your data and program--
 setwd("C:/My Documents/MI")
 rm(list=ls(all=TRUE))
 options(echo=TRUE)
 #--you need the following two packages--you must install them first--
 library(foreign)
 library(mice)
 library(tripak)
 library(zoo)
 library(sp)
 library(maptools)
 library(spam))
 #--for program 2 below
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #--To find the citation for a package, use this function:---
 citation("mice")
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in auxiliary variables---
 load("vaux.Rdata",.GlobalEnv)
 row.names(vaux)<-NULL
 #--Read in the SCCS dataset---
 load("SCCS.Rdata",.GlobalEnv)
 
 #--look at first 6 rows of vaux--
 head(vaux)
 #--look at field names of vaux--
 names(vaux)
 #--check to see that rows are properly aligned in the two datasets--
 #--sum should equal 186---
 sum((SCCS$socname==vaux$socname)*1)
 #--remove the society name field--
 vaux<-vaux[,-28]
 names(vaux)
 
 #--Two nominal variables: brg and rlg----
 #--brg: consolidated Burton  Regions-----
 #0 = (rest of world) circumpolar, South and Meso-America, west North America
 #1 = Subsaharan Africa
 #2 = Middle Old World
 #3 = Southeast Asia, Insular Pacific, Sahul
 #4 = Eastern Americas
 #--rlg: Religion---
 #'0 (no world religion)'  
 #'1 (Christianity)'  
 #'2 (Islam)'  
 #'3 (Hindu/Buddhist)'  
 
 #--check to see number of missing values in vaux, 
 #--whether variables are numeric,
 #--and number of discrete values for each variable---
 vvn<-names(vaux)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(vaux[,vvn[i]])))
 numeric<-is.numeric(vaux[,vvn[i]])
 numDiscrVals<-length(table(vaux[,vvn[i]]))
 pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
 }
 row.names(pp)<-vvn
 pp
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--extract variables to be used from SCCS, put in dataframe fx--
 fx<-data.frame(
 socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=SCCS$v821, 
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
 valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
 fratgrpstr=SCCS$v570,
 marrcaptives=SCCS$v870,
 plunder=SCCS$v912, 
 ###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
 nonmatrel=SCCS$v52,
 lrgfam=SCCS$v68,
 exogamy=SCCS$v72,
 famsize=SCCS$v80,
 money=SCCS$v155,
 popdens=SCCS$v156,
 malesexag=SCCS$v175,
 ndrymonth=SCCS$v196,
 gath=SCCS$v203,
 hunt=SCCS$v204,
 fish=SCCS$v205,
 anim=SCCS$v206,
 brideprice=(SCCS$v208==1)*1,
 nuclearfam=(SCCS$v210<=3)*1,
 ncmallow=SCCS$v227,
 cultints=SCCS$v232,
 tree=(SCCS$v233==4)*1,
 roots=(SCCS$v233==5)*1,
 cereals=(SCCS$v233==6)*1,
 settype=SCCS$v234,
 localjh=(SCCS$v236-1),
 superjh=SCCS$v237,
 moralgods=SCCS$v238,
 segadlboys=SCCS$v242,
 plow=(SCCS$v243>1)*1,
 pigs=(SCCS$v244==2)*1,
 bovines=(SCCS$v244==7)*1,
 milk=(SCCS$v245>1)*1,
 agrlateboy=SCCS$v300,
 fempower=SCCS$v663,
 migr=(SCCS$v677==2)*1,
 foodtrade=SCCS$v819,
 dateobs=SCCS$v838,
 rain=SCCS$v854,
 temp=SCCS$v855,
 ecorich=SCCS$v857,
 #pctFemPolyg=SCCS$v872,
 ###femsubs=SCCS$v890, ###Dep var is added to get this ... so deleted
 himilexp=(SCCS$v899==1)*1,
 AP1=SCCS$v921,
 AP2=SCCS$v928,
 pathstress=SCCS$v1260,
 war=SCCS$v1648,
 foodscarc=SCCS$v1685,
 sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
 wagelabor=SCCS$v1732,
 CVrain=SCCS$v1914/SCCS$v1913
 ) 
 #--look at first 6 rows of fx--
 head(fx)
 
 #--check to see number of missing values--
 #--also check whether numeric--
 vvn<-names(fx)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(fx[,vvn[i]])))
 numeric<-is.numeric(fx[,vvn[i]])
 pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
 }
 row.names(pp)<-vvn
 pp
 
 #--identify variables with missing values--
 z<-which(pp[,1]>0)
 zv1<-vvn[z]
 zv1
 #--identify variables with non-missing values--
 z<-which(pp[,1]==0)
 zv2<-vvn[z]
 zv2
   
 
 
 
 
 
 
 #Program 1 Part B
 #-----------------------------
 #----Multiple imputation------
 #-----------------------------
 
 #--number of imputed data sets to create--
 #nimp<-10 #CHANGED TO TEST SPEEDUP  
 nimp<-3  ################################################################## speedup test
 #--one at a time, loop through those variables with missing values--
 for (i in 1:length(zv1)){
 #--attach the imputand to the auxiliary data--
 zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
 #--in the following line, the imputation is done--
 aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
 #--during first iteration of the loop, create dataframe impdat--
 if (i==1){
 impdat<-data.frame(aqq[,c(".id",".imp")])
 }
 #--the imputand is placed as a field in impdat and named--
 impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
 names(impdat)[NCOL(impdat)]<-zv1[i]
 }
 
 #--now the non-missing variables are attached to impdat--
 gg<-NULL
 for (i in 1:nimp){
 gg<-rbind(gg,data.frame(fx[,zv2]))
 }
 impdat<-cbind(impdat,gg)
 
 #--take a look at the top 6 and bottom 6 rows of impdat--
 head(impdat)
 tail(impdat)
 
 #--impdat is saved as an R-format data file--
 save(impdat,file="impdat.Rdata")
 



 #Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 
 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+AP1+AP2+
 marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+
 ecorich+popdens+exogamy+ncmallow+famsize+
 settype+localjh+moralgods+fempower+###femsubs+
 war+himilexp+money+wagelabor
 ,data=m9)
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr+marrcaptives+plunder+   ###climate+fish+
 #  plow+                              ###cereals+gath+
 #  ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 #  ###pathstress+                             ecorich+popdens+exogamy+ncmallow+famsize+
 #  ###settype+localjh+superjh+moralgods+
 #  fempower+femsubs                   ###+
 #  ###war+himilexp  +money+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

2nd results replaced: no femsubs, no log(femcontagri) all diagnostics ok now

  R2:final model R2:IV(distance) R2:IV(language) 
      0.3953282       0.9721403       0.9850290 
                Fstat          df pvalue
RESET           0.024 2193952.700  0.878
Wald on restrs. 0.263   18727.343  0.608
NCV             1.494      76.357  0.225
SWnormal        1.333    3482.239  0.248
lagll           0.969 3370425.593  0.325
lagdd           1.506    3438.520  0.220
                mnb   fst          v  pval    VIF
(Intercept)  -2.089 0.003     38.590 0.959     NA
fyll          1.150 3.902    143.938 0.050  2.948 <-- keep
fydd          0.166 0.181   7639.121 0.670  3.536
AP1          -1.032 1.270    545.643 0.260  1.871
AP2           0.461 0.070    693.127 0.792  1.638
marrcaptives  0.901 0.120     62.218 0.730  1.838
plunder      -2.830 0.320    109.557 0.573  1.806
cultints     -0.673 0.091    125.740 0.763  2.730
roots        12.604 0.708    742.489 0.400 13.529
cereals       5.853 0.153   1469.957 0.695 16.601
gath         -1.909 0.501   1628.870 0.479  2.437
plow         -1.426 0.036 272312.902 0.850  3.262
hunt          1.792 0.498    969.412 0.480  3.648
fish         -0.493 0.068  35519.524 0.794  2.282
anim         -2.384 1.221   1901.243 0.269  4.181 <-- keep (tentative) VIF TOO HIGH
pigs          1.388 0.041   2172.703 0.839  2.406
milk         -5.653 0.663 653491.534 0.416  3.501
bovines       2.246 0.096   4124.342 0.757  4.129
tree          3.206 0.034   2295.950 0.855  7.500
foodtrade    -0.357 2.290    399.508 0.131  1.683 <-- keep
ecorich      -0.992 0.265    737.507 0.607  1.606
popdens      -2.238 1.114    516.971 0.292  3.110
exogamy       3.306 3.762   7018.488 0.052  1.366 <-- keep
ncmallow      1.095 1.268     19.050 0.274  1.576 <-- keep (tentative)
famsize      -0.327 0.033   1349.584 0.855  1.527
settype      -0.385 0.059    428.300 0.808  2.841
localjh      -1.571 0.182   2907.841 0.670  1.601
moralgods    -0.082 0.001 129752.699 0.970  2.165
fempower      0.172 0.020     22.125 0.888  1.415
war           0.198 0.388    127.192 0.534  1.488
himilexp      5.684 1.576  13660.481 0.209  1.502 <-- keep (tentative)
money         2.124 1.592   2468.882 0.207  2.064 <-- keep (tentative)
wagelabor    -0.006 0.000      3.451 0.999  1.579

program replaced from 9A| above, ready for trim of xR<-

  1. Program 1 --> Program 2 below
 #Program 1 Part A
 #MI--make the imputed datasets
 #--change the following path to the directory with your data and program--
 setwd("C:/My Documents/MI")
 rm(list=ls(all=TRUE))
 options(echo=TRUE)
 #--you need the following two packages--you must install them first--
 library(foreign)
 library(mice)
 library(tripak)
 library(zoo)
 library(sp)
 library(maptools)
 library(spam))
 #--for program 2 below
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #--To find the citation for a package, use this function:---
 citation("mice")
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in auxiliary variables---
 load("vaux.Rdata",.GlobalEnv)
 row.names(vaux)<-NULL
 #--Read in the SCCS dataset---
 load("SCCS.Rdata",.GlobalEnv)
 
 #--look at first 6 rows of vaux--
 head(vaux)
 #--look at field names of vaux--
 names(vaux)
 #--check to see that rows are properly aligned in the two datasets--
 #--sum should equal 186---
 sum((SCCS$socname==vaux$socname)*1)
 #--remove the society name field--
 vaux<-vaux[,-28]
 names(vaux)
 
 #--Two nominal variables: brg and rlg----
 #--brg: consolidated Burton  Regions-----
 #0 = (rest of world) circumpolar, South and Meso-America, west North America
 #1 = Subsaharan Africa
 #2 = Middle Old World
 #3 = Southeast Asia, Insular Pacific, Sahul
 #4 = Eastern Americas
 #--rlg: Religion---
 #'0 (no world religion)'  
 #'1 (Christianity)'  
 #'2 (Islam)'  
 #'3 (Hindu/Buddhist)'  
 
 #--check to see number of missing values in vaux, 
 #--whether variables are numeric,
 #--and number of discrete values for each variable---
 vvn<-names(vaux)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(vaux[,vvn[i]])))
 numeric<-is.numeric(vaux[,vvn[i]])
 numDiscrVals<-length(table(vaux[,vvn[i]]))
 pp<-rbind(pp,cbind(data.frame(numeric),nmiss,numDiscrVals))
 }
 row.names(pp)<-vvn
 pp
 
 #MODIFY THESE STATEMENTS FOR A NEW PROJECT
 #--extract variables to be used from SCCS, put in dataframe fx--
 fx<-data.frame(
 socname=SCCS$socname,socID=SCCS$"sccs#",
Agricultural=SCCS$v821, 
lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180),
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182),
 valchild=(SCCS$v473+SCCS$v474+SCCS$v475+SCCS$v476),
 fratgrpstr=SCCS$v570,
 marrcaptives=SCCS$v870,
 plunder=SCCS$v912, 
 ###climate<-(SCCS$v857==1 | SCCS$v857==2 | SCCS$v857==6)*1+(SCCS$v857==3 | SCCS$v857==4)*2+(SCCS$v857==5)*3,
 nonmatrel=SCCS$v52,
 lrgfam=SCCS$v68,
 exogamy=SCCS$v72,
 famsize=SCCS$v80,
 money=SCCS$v155,
 popdens=SCCS$v156,
 malesexag=SCCS$v175,
 ndrymonth=SCCS$v196,
 gath=SCCS$v203,
 hunt=SCCS$v204,
 fish=SCCS$v205,
 anim=SCCS$v206,
 brideprice=(SCCS$v208==1)*1,
 nuclearfam=(SCCS$v210<=3)*1,
 ncmallow=SCCS$v227,
 cultints=SCCS$v232,
 tree=(SCCS$v233==4)*1,
 roots=(SCCS$v233==5)*1,
 cereals=(SCCS$v233==6)*1,
 settype=SCCS$v234,
 localjh=(SCCS$v236-1),
 superjh=SCCS$v237,
 moralgods=SCCS$v238,
 segadlboys=SCCS$v242,
 plow=(SCCS$v243>1)*1,
 pigs=(SCCS$v244==2)*1,
 bovines=(SCCS$v244==7)*1,
 milk=(SCCS$v245>1)*1,
 agrlateboy=SCCS$v300,
 fempower=SCCS$v663,
 migr=(SCCS$v677==2)*1,
 foodtrade=SCCS$v819,
 dateobs=SCCS$v838,
 rain=SCCS$v854,
 temp=SCCS$v855,
 ecorich=SCCS$v857,
 #pctFemPolyg=SCCS$v872,
 ###femsubs=SCCS$v890, ###Dep var is added to get this ... so deleted
 himilexp=(SCCS$v899==1)*1,
 AP1=SCCS$v921,
 AP2=SCCS$v928,
 pathstress=SCCS$v1260,
 war=SCCS$v1648,
 foodscarc=SCCS$v1685,
 sexratio=1+(SCCS$v1689>85)+(SCCS$v1689>115),
 wagelabor=SCCS$v1732,
 CVrain=SCCS$v1914/SCCS$v1913
 ) 
 #--look at first 6 rows of fx--
 head(fx)
 
 #--check to see number of missing values--
 #--also check whether numeric--
 vvn<-names(fx)
 pp<-NULL
 for (i in 1:length(vvn)){
 nmiss<-length(which(is.na(fx[,vvn[i]])))
 numeric<-is.numeric(fx[,vvn[i]])
 pp<-rbind(pp,cbind(nmiss,data.frame(numeric)))
 }
 row.names(pp)<-vvn
 pp
 
 #--identify variables with missing values--
 z<-which(pp[,1]>0)
 zv1<-vvn[z]
 zv1
 #--identify variables with non-missing values--
 z<-which(pp[,1]==0)
 zv2<-vvn[z]
 zv2
   
 

 
 #Program 1 Part B
 #-----------------------------
 #----Multiple imputation------
 #-----------------------------
 
 #--number of imputed data sets to create--
 #nimp<-10 #CHANGED TO TEST SPEEDUP  
 nimp<-3  ################################################################## speedup test
 #--one at a time, loop through those variables with missing values--
 for (i in 1:length(zv1)){
 #--attach the imputand to the auxiliary data--
 zxx<-data.frame(cbind(vaux,fx[,zv1[i]]))
 #--in the following line, the imputation is done--
 aqq<-complete(mice(zxx,maxit=20,m=nimp),action="long")
 #--during first iteration of the loop, create dataframe impdat--
 if (i==1){
 impdat<-data.frame(aqq[,c(".id",".imp")])
 }
 #--the imputand is placed as a field in impdat and named--
 impdat<-cbind(impdat,data.frame(aqq[,NCOL(zxx)]))
 names(impdat)[NCOL(impdat)]<-zv1[i]
 }
 
 #--now the non-missing variables are attached to impdat--
 gg<-NULL
 for (i in 1:nimp){
 gg<-rbind(gg,data.frame(fx[,zv2]))
 }
 impdat<-cbind(impdat,gg)
 
 #--take a look at the top 6 and bottom 6 rows of impdat--
 head(impdat)
 tail(impdat)
 
 #--impdat is saved as an R-format data file--
 save(impdat,file="impdat.Rdata")
 



 #Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)


 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+###AP1+AP2+
 ###fratgrpstr+pigs+milk+bovines+tree+foodscarc+
 ###marrcaptives+plunder+roots###climate+
 ###cultints+cereals+gath+plow  
 ###hunt+fish+
 anim ###+###pigs+milk+bovines+tree+foodtrade+
 ###ecorich+popdens+exogamy+
 ###ncmallow  ###+###famsize+
 ###settype+localjh+moralgods+fempower+###femsubs+
 ###war+wagelabor+
 ###himilexp   ###+money######  
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr+marrcaptives+plunder+   ###climate+fish+
 #  plow+                              ###cereals+gath+
 #  ###hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 #  ###pathstress+                             ecorich+popdens+exogamy+ncmallow+famsize+
 #  ###settype+localjh+superjh+moralgods+
 #  fempower+femsubs                   ###+
 #  ###war+himilexp  +money+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

add all the non sign. var into the model

 #Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)


 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+anim+
 wagelabor+war+fempower+moralgods+famsize+settype+money+tree+
 ecorich+bovines+milk+pigs+fish+cereals+himilexp+popdens+ncmallow+
 localjh+marrcaptives+plunder+plow
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr   ###climate+
 #                                ###+gath+
 #  ###huntanim+++foodtrade+foodscarc+
 #  ###pathstress+                             +exogamy
 #  ###+superjh+
 #  +femsubs                   ###+
 #  ###war+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

run 1

 r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.3413441       0.9715283       0.9843970 
>   ccc
                Fstat          df pvalue
RESET           0.013 7600999.574  0.909
Wald on restrs. 0.149    1601.024  0.699
NCV             2.231     223.178  0.137
SWnormal        0.001 7550390.705  0.972
lagll           0.674  766837.254  0.412
lagdd           1.115  905362.164  0.291
>   bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>   bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>   bbb
                 mnb   fst            v  pval   VIF
(Intercept)  -18.223 0.376 7.541460e+02 0.540    NA
fyll           1.494 8.611 7.632658e+03 0.003 2.348
fydd           0.158 0.185 1.766700e+03 0.668 3.098
anim          -2.322 1.813 3.221159e+06 0.178 2.673
wagelabor      2.425 0.970 1.000292e+03 0.325 1.457
war            0.367 1.434 1.145640e+02 0.234 1.356
fempower      -0.562 0.242 4.032000e+01 0.625 1.247-->high pval
moralgods      0.293 0.017 3.890900e+01 0.897 1.982--> high pval
famsize       -0.177 0.010 2.277362e+04 0.919 1.425-->high pval
settype        0.264 0.042 2.846148e+08 0.838 1.945-->high pval
money          1.068 0.446 5.281598e+04 0.504 1.853
tree          -8.484 1.140 7.128050e+02 0.286 1.505
ecorich       -1.531 0.686 3.391136e+04 0.408 1.491
bovines        3.839 0.291 1.073881e+04 0.590 3.948
milk          -4.831 0.475 1.505470e+02 0.492 3.303
pigs           4.867 0.555 5.303360e+02 0.457 2.127
fish          -0.452 0.073 1.420144e+05 0.787 1.775-->high pval
cereals       -5.410 1.176 3.649807e+04 0.278 1.852
himilexp       2.577 0.348 6.508700e+01 0.557 1.274
popdens       -2.334 1.548 3.661776e+05 0.213 2.472
ncmallow       1.105 1.683 1.412040e+02 0.197 1.412
localjh       -1.874 0.281 4.714552e+04 0.596 1.471
marrcaptives   1.775 0.512 1.727430e+02 0.475 1.730
plunder       -3.188 0.379 1.311800e+01 0.549 1.559
plow          -2.988 0.181 1.028542e+04 0.671 2.769-->high pval
>   aaa
             0              5              6             10             15 
           "8"            "7"            "1"            "4"            "9" 
            19             20             25             30             31 
           "2"            "9"            "9"            "5"            "2" 
            33             35             38             40             44 
           "1"           "10"            "1"           "12"            "2" 
            45             50             55             56             60 
           "9"            "9"            "6"            "2"            "4" 
            63             65             67             70             75 
           "1"           "11"            "1"            "5"            "3" 
            80             85             95            100                
           "3"            "4"            "1"            "1"          "142" 
               
"PctFemContAg" 
>   impute
[1] "number of imputations nimp=" "3"                          
>

run 2

  1. Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)


 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+anim+
 wagelabor+war+money+tree+
 ecorich+bovines+milk+pigs+cereals+himilexp+popdens+ncmallow+
 localjh+marrcaptives+plunder
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr   ###climate+
 #                                ###+gath+
 #  ###huntanim+++foodtrade+foodscarc+
 #  ###pathstress+                             +exogamy
 #  ###+superjh+
 #  +femsubs                   ###+
 #  ###war+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

results round2

 R2:final model R2:IV(distance) R2:IV(language) 
      0.3341113       0.9715283       0.9843970 
>   ccc
                Fstat           df pvalue
RESET           0.032   194311.812  0.858
Wald on restrs. 0.149     1601.024  0.699
NCV             2.536      644.915  0.112
SWnormal        0.001  5281494.912  0.980
lagll           0.622   572952.563  0.430
lagdd           1.069 34882013.728  0.301
>   bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>   bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>   bbb
                 mnb    fst            v  pval   VIF
(Intercept)  -24.385  0.936      101.704 0.336    NA
fyll           1.562 11.023    71482.352 0.001 2.096
fydd           0.167  0.251      733.487 0.617 2.610
anim          -2.274  2.103   597844.287 0.147 2.298
wagelabor      2.673  1.270      692.565 0.260 1.399
war            0.348  1.426      116.112 0.235 1.275
money          1.232  0.685   832690.379 0.408 1.677
tree          -7.039  0.884     6306.554 0.347 1.417
ecorich       -1.508  0.741    19516.569 0.390 1.392
bovines        3.013  0.225    19377.422 0.635 3.274--> high
milk          -4.477  0.499     1075.717 0.480 2.924--.high
pigs           5.057  0.684      301.301 0.409 1.916
cereals       -4.392  0.912 25451007.489 0.340 1.642
himilexp       2.609  0.410      155.154 0.523 1.198-->high
popdens       -2.563  2.331    10918.983 0.127 2.046
ncmallow       1.094  1.804       89.224 0.183 1.318
localjh       -1.838  0.366    21002.568 0.545 1.127-->high
marrcaptives   1.853  0.615       82.557 0.435 1.583-->high
plunder       -3.500  0.497       12.920 0.493 1.485-->high
>   aaa
             0              5              6             10             15 
           "8"            "7"            "1"            "4"            "9" 
            19             20             25             30             31 
           "2"            "9"            "9"            "5"            "2" 
            33             35             38             40             44 
           "1"           "10"            "1"           "12"            "2" 
            45             50             55             56             60 
           "9"            "9"            "6"            "2"            "4" 
            63             65             67             70             75 
           "1"           "11"            "1"            "5"            "3" 
            80             85             95            100                
           "3"            "4"            "1"            "1"          "142" 
               
"PctFemContAg" 
>   impute
[1] "number of imputations nimp=" "3"                          
> 

run 3

  1. Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+anim+
 wagelabor+war+money+tree+
 ecorich+pigs+cereals+popdens+ncmallow
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr   ###climate+
 #                                ###+gath+
 #  ###huntanim+++foodtrade+foodscarc+
 #  ###pathstress+                             +exogamy
 #  ###+superjh+
 #  +femsubs                   ###+
 #  ###war+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

run 3 results

 r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.3109951       0.9715283       0.9843970 
>   ccc
                Fstat          df pvalue
RESET           0.141   58133.382  0.708
Wald on restrs. 0.149    1601.024  0.699
NCV             3.810   13984.923  0.051
SWnormal        0.004  102948.381  0.948
lagll           0.491  471582.214  0.484
lagdd           0.889 2150990.650  0.346
>   bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>   bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>   bbb
                mnb    fst            v  pval   VIF
(Intercept) -36.386  3.652  7462087.063 0.056    NA
fyll          1.642 13.038    22734.652 0.000 1.983
fydd          0.260  0.687   305226.983 0.407 2.395
anim         -2.429  3.545  5249510.920 0.060 1.577-->sign.
wagelabor     2.581  1.280     1289.262 0.258 1.320
war           0.531  3.951       67.923 0.051 1.060-->sign.
money         1.263  0.767  7536694.393 0.381 1.596-->remove
tree         -6.587  0.819    35644.327 0.365 1.362-->remove
ecorich      -1.114  0.432 11886309.109 0.511 1.327-->remove
pigs          5.311  1.030    17940.744 0.310 1.478
cereals      -4.380  0.953    18041.723 0.329 1.576-->remove
popdens      -2.678  3.083   677080.151 0.079 1.723-->sign
ncmallow      0.802  1.112      195.466 0.293 1.199
>   aaa
             0              5              6             10             15 
           "8"            "7"            "1"            "4"            "9" 
            19             20             25             30             31 
           "2"            "9"            "9"            "5"            "2" 
            33             35             38             40             44 
           "1"           "10"            "1"           "12"            "2" 
            45             50             55             56             60 
           "9"            "9"            "6"            "2"            "4" 
            63             65             67             70             75 
           "1"           "11"            "1"            "5"            "3" 
            80             85             95            100                
           "3"            "4"            "1"            "1"          "142" 
               
"PctFemContAg" 
>   impute
[1] "number of imputations nimp=" "3"                          
>

run 4

  1. Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+anim+
 wagelabor+war+pigs+popdens+ncmallow
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr   ###climate+
 #                                ###+gath+
 #  ###huntanim+++foodtrade+foodscarc+
 #  ###pathstress+                             +exogamy
 #  ###+superjh+
 #  +femsubs                   ###+
 #  ###war+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

4th results

 r2
 R2:final model R2:IV(distance) R2:IV(language) 
      0.2977494       0.9715283       0.9843970 
>   ccc
                Fstat          df pvalue
RESET           0.843   87803.617  0.359
Wald on restrs. 0.149    1601.024  0.699
NCV             3.164 1750200.749  0.075
SWnormal        0.002 3418932.160  0.967
lagll           0.417  578834.079  0.519
lagdd           0.865 3077053.852  0.352
>   bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
>   bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
>   bbb
                mnb    fst            v  pval   VIF
(Intercept) -36.969  4.114 5.379278e+13 0.043    NA
fyll          1.587 12.995 3.069155e+04 0.000 1.882
fydd          0.249  0.657 1.164491e+05 0.417 2.329
anim         -2.481  3.921 2.821465e+07 0.048 1.505-->sign
wagelabor     2.689  1.503 8.424910e+02 0.221 1.229
war           0.502  3.674 6.949300e+01 0.059 1.031-->sign
pigs          6.339  1.796 2.222197e+04 0.180 1.222
popdens      -2.636  4.832 6.405527e+04 0.028 1.076--.sign
ncmallow      0.585  0.621 1.838360e+02 0.432 1.153-->delete
>   aaa
             0              5              6             10             15 
           "8"            "7"            "1"            "4"            "9" 
            19             20             25             30             31 
           "2"            "9"            "9"            "5"            "2" 
            33             35             38             40             44 
           "1"           "10"            "1"           "12"            "2" 
            45             50             55             56             60 
           "9"            "9"            "6"            "2"            "4" 
            63             65             67             70             75 
           "1"           "11"            "1"            "5"            "3" 
            80             85             95            100                
           "3"            "4"            "1"            "1"          "142" 
               
"PctFemContAg" 
>   impute
[1] "number of imputations nimp=" "3"                          
> 
>

run 5

  1. Program 2 (12-18-2009)==
 #MI--estimate model, combine results
 
 rm(list=ls(all=TRUE))
 #--Set path to your directory with data and program--
 setwd("C:/My Documents/MI")
 options(echo=TRUE)
 
 #--need these packages for estimation and diagnostics--
 library(foreign)
 library(spdep)
 library(car)
 library(lmtest)
 library(sandwich)
 
 #-----------------------------
 #--Read in data, rearrange----
 #-----------------------------
 
 #--Read in original SCCS data---
 load("SCCS.Rdata",.GlobalEnv)
 #--Read in two weight matrices--
 ll<-as.matrix(read.dta("langwm.dta")[,-1])
 dd<-as.matrix(read.dta("dist25wm.dta")[,c(-1,-2,-189)])
 #--Read in the imputed dataset---
 load("impdat.Rdata",.GlobalEnv)
 
 #--create dep.varb. you wish to use from SCCS data--
 #--Here we sum variables measuring how much a society values children--
 #depvar<-apply(SCCS[,c("v473","v474","v475","v476")],1,sum) #can replace "sum" with "max"
 depvar<-SCCS$v821   ###log(1+SCCS$v821)  ###
 #--find obs. for which dep. varb. is non-missing--
 zdv<-which(!is.na(depvar))
 depvar<-depvar[zdv]
 #HERE GIVE THE "NAME" OF THE DEPENDENT VARIABLE THAT IS COMPUTED
 depvarname<-"PctFemContAg"
 #--can add additional SCCS variable, but only if it has no missing values---
 dateobs<-SCCS$v838
 dateobs<-dateobs[zdv]
 #--look at histogram and frequencies for the dep. varb.--
 hist(depvar)
 table(depvar)
 
 #--modify weight matrices---
 #--set diagonal equal to zeros--
 diag(ll)<-0
 diag(dd)<-0
 #--use only obs. where dep. varb. non-missing--
 ll<-ll[zdv,zdv]
 dd<-dd[zdv,zdv]
 #--row standardize (rows sum to one)
 ll<-ll/rowSums(ll)
 dd<-dd/rowSums(dd)
 #--check to see that rows sum to one
 rowSums(ll)
 rowSums(dd)
 #--make weight matrix object for later autocorrelation test--
 wmatll<-mat2listw(as.matrix(ll))
 wmatdd<-mat2listw(as.matrix(dd))
 
 
 indpv<-c("foodscarc",###"femsubs",
 "fratgrpstr","marrcaptives","plunder",###"climate",
 "exogamy","ncmallow","superjh","moralgods","fempower","sexratio",
 "war","himilexp","wagelabor",
 "famsize","settype","localjh","money",
 "cultints","roots","cereals","gath","hunt","fish",
 "anim","pigs","milk","plow","bovines","tree","foodtrade",
 "ndrymonth","ecorich",
 "popdens","pathstress","CVrain","rain","temp","AP1","AP2")
 
 
 #-----------------------------------------------------
 #---Estimate model on each imputed dataset------------
 #-----------------------------------------------------
 
 #--number of imputed datasets--
 #nimp<-10
 nimp<-3  ################################################################## speedup test
 
 #--will append values to these empty objects--
 vif<-NULL
 ss<-NULL
 beta<-NULL
 dng<-NULL
 
 #--loop through the imputed datasets--
 for (i in 1:nimp){
 
 #--select the ith imputed dataset--
 m9<-impdat[which(impdat$.imp==i),]
 #--retain only obs. for which dep. varb. is nonmissing--
 m9<-m9[zdv,]
 
 #--create spatially lagged dep. varbs.--
 y<-as.matrix(depvar)
 xx<-as.matrix(m9[,indpv])
 #--for instruments we use the spatial lag of our indep. varbs.--
 #--First, the spatially lagged varb. for distance--
 xdy<-dd%*%xx
 cyd<-dd%*%y
 o<-lm(cyd~xdy)
 #--the fitted value is our instrumental variable--
 fydd<-fitted(o)
 #--keep R2 from this regression--
 dr2<-summary(o)$r.squared
 #--Then, the spatially lagged varb. for language--
 xly<-ll%*%xx   
 cyl<-ll%*%y
 o<-lm(cyl~xly)
 #--the fitted value is our instrumental variable--
 fyll<-fitted(o)
 #--keep R2 from this regression--
 lr2<-summary(o)$r.squared
 m9<-cbind(m9,fydd,fyll)
 
 #--OLS estimate of unrestricted model--
 xUR<-lm(depvar~fyll+fydd++AP1+AP2+
 fratgrpstr+marrcaptives+plunder+###climate+
 cultints+roots+cereals+gath+plow+
 hunt+fish+anim+pigs+milk+bovines+tree+foodtrade+foodscarc+
 ecorich+popdens+pathstress+exogamy+ncmallow+famsize+
 settype+localjh+superjh+moralgods+fempower+###femsubs+
 sexratio+war+himilexp+money+wagelabor
 ,data=m9)
 #--OLS estimate of restricted model--
 xR<-lm(depvar~fyll+fydd+anim+
 war+wagelabor+popdens
 ,data=m9)
 
 # fydd+                ### fyll+ <--- ADDED fydd for distance clustering
 #  fratgrpstr   ###climate+
 #                                ###+gath+
 #  ###huntanim+++foodtrade+foodscarc+
 #  ###pathstress+                             +exogamy
 #  ###+superjh+
 #  +femsubs                   ###+
 #  ###war+wagelabor  ###sexratio+
 #  ,data=m9)
 #--corrected sigma2 and R2 for 2SLS--
 qxx<-m9
 qxx[,"fydd"]<-cyd
 qxx[,"fyll"]<-cyl
 b<-coef(xR)
 incpt<-matrix(1,NROW(qxx),1)
 x<-as.matrix(cbind(incpt,qxx[,names(b)[-1]]))
 e<-y-x%*%as.matrix(b)
 cs2<-as.numeric(t(e)%*%e/(NROW(x)-NCOL(x)))
 cr2<-as.numeric(1-t(e)%*%e/sum((y-mean(y))^2))
 
 #--collect coefficients and their variances--
 ov<-summary(xR)
 vif<-rbind(vif,vif(xR))
 ss<-rbind(ss,diag(ov$cov*cs2))
 #--collect robust coef. variances when there is heteroskedasticity--
 #eb<-e^2
 #x<-as.matrix(cbind(incpt,m9[,names(b)[-1]]))
 #hcm<-inv(t(x)%*%x)%*%t(x)%*%diag(eb[1:length(eb)])%*%x%*%inv(t(x)%*%x)
 #ss<-rbind(ss,diag(hcm))
 beta<-rbind(beta,coef(xR))
 
 #--collect some model diagnostics--
 dropt<-c("fratgrpstr","foodscarc","pathstress","superjh","sexratio")
 
 
 #--Ramsey RESET test--
 p1<-qchisq(resettest(xR,type="fitted")$"p.value",1,lower.tail=FALSE)
 #--Wald test (H0: dropped variables have coefficient equal zero)--
 o<-linear.hypothesis(xUR,dropt,test="Chisq")$"Pr(>Chisq)"[2]
 p2<-qchisq(o,1,lower.tail=FALSE) #find Chisq with 1 d.f. and same pvalue
 #--Heteroskedasticity test (H0: homoskedastic residuals)--
 p3<-ncv.test(xR)$ChiSquare
 #--Shapiro-Wilke normality test (H0: residuals normal)
 p4<-qchisq(shapiro.test(residuals(xR))$p.value,1,lower.tail=FALSE)
 #--LaGrange Multiplier test for spatial autocorrelation: language--
 o<-lm.LMtests(xR, wmatll, test=c("LMlag"))
 p5<-as.numeric(o$LMlag$statistic)
 #--LaGrange Multiplier test for spatial autocorrelation: distance--
 o<-lm.LMtests(xR, wmatdd, test=c("LMlag"))
 p6<-as.numeric(o$LMlag$statistic)
 #--model R2--
 p7<-cr2
 dng<-rbind(dng,cbind(p1,p2,p3,p4,p5,p6,p7,dr2,lr2))
 
 }
 
 
 #--------------------------------------------
 #--Rubin's formulas for combining estimates--
 #--------------------------------------------
 
 #--first find final regr. coefs. and p-values--
 mnb<-apply(beta,2,mean)
 ####################vrb<-colSums((beta-t(matrix(mnb,length(mnb),10)))^2)/(nimp-1)
 vrb<-colSums((beta-t(matrix(mnb,length(mnb),nimp)))^2)/(nimp-1)
 mnv<-apply(ss,2,mean)
 vrT<-mnv+vrb*(1-nimp^(-1))
 fst<-mnb^2/vrT
 r<-(1+nimp^(-1))*vrb/mnv
 v<-(nimp-1)*(1+r^(-1))^2
 pval<-pf(fst,1,v,lower.tail=FALSE)
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),8)) #,3 changed to .8 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 names(bbb)<-c("coef","Fstat","ddf","pvalue","VIF")
 
 #--Then combine the diagnostics we collected--
 dng<-data.frame(dng)
 names(dng)<-c("RESET","Wald on restrs.","NCV","SWnormal","lagll","lagdd",
 "R2:final model","R2:IV(distance)","R2:IV(language)")
 r2<-apply(dng[,7:9],2,mean)
 adng<-dng[,1:6]
 mdm<-apply(adng,2,mean)
 vrd<-colSums((adng-t(matrix(mdm,length(mdm),nimp)))^2)/(nimp-1)
 aa<-4*mdm^2-2*vrd
 aa[which(aa<0)]<-0
 rd<-(1+nimp^(-1))*vrd/(2*mdm+aa^.5)
 vd<-(nimp-1)*(1+rd^(-1))^2
 Dm<-(mdm-(nimp-1)/(nimp+1)*rd)/(1+rd)
 #-All chi-sq we collected have df=1-------
 pvald<-pf(Dm,1,vd,lower.tail=FALSE)
 ccc<-data.frame(round(cbind(Dm,vd,pvald),3))
 names(ccc)<-c("Fstat","df","pvalue")
 #--write results to csv file for perusal in spreadsheet--
 write.csv("==OLS model for depvar==",file="OLSresults.csv",append=FALSE)
 write.csv(bbb,file="OLSresults.csv",append=TRUE)
 write.csv(r2,file="OLSresults.csv",append=TRUE)
 write.csv(ccc,file="OLSresults.csv",append=TRUE)
 
 aaa<-c(table(depvar), NROW(depvar),depvarname)
 imp<-"number of imputations nimp="
 impute=c(imp,nimp)
 lat=-SCCS$v179*(SCCS$v180-1)+SCCS$v179*(2-SCCS$v180)
 lon=-SCCS$v181*(SCCS$v182-3)+SCCS$v179*(4-SCCS$v182)
 lon[90]=-131
 bbb
 r2
 ccc
 bbb<-data.frame(round(cbind(mnb,fst,v,pval),3)) #.8 changed to back to .3 for significance test
 bbb$VIF[2:NROW(bbb)]<-round(apply(vif,2,mean),3)
 bbb
 aaa
 impute

final results

 R2:final model R2:IV(distance) R2:IV(language) 
      0.2810274       0.9715283       0.9843970 
                Fstat          df pvalue  "PctFemContAg"
RESET           0.586   12218.688  0.444 <-no vars need log
Wald on restrs. 0.149    1601.024  0.699 <-no var missing
NCV             2.417   13632.135  0.120 <-no bunching
SWnormal        0.006 1981789.848  0.936 <-normal distrib errors
lagll           0.411  377470.146  0.522 <-no lang autocor
lagdd           0.832 1907553.379  0.362 <-no dist autocor
                mnb    fst          v  pval   VIF
(Intercept) -30.645  2.942 417585.374 0.086    NA
fyll          1.399 10.982  75919.214 0.001 1.717
fydd          0.399  1.854 354805.414 0.173 2.099
anim         -2.750  4.911 965958.604 0.027 1.464
war           0.446  2.968     83.698 0.089 1.010
wagelabor     2.332  1.151   1114.314 0.284 1.200 <-- drop for a final xR model
popdens      -2.356  3.912  48757.127 0.048 1.052
             0              5              6             10             15 
           "8"            "7"            "1"            "4"            "9" 
            19             20             25             30             31 
           "2"            "9"            "9"            "5"            "2" 
            33             35             38             40             44 
           "1"           "10"            "1"           "12"            "2" 
            45             50             55             56             60 
           "9"            "9"            "6"            "2"            "4" 
            63             65             67             70             75 
           "1"           "11"            "1"            "5"            "3" 
            80             85             95            100  "PctFemContAg"               
           "3"            "4"            "1"            "1"          "142"

3rd results negative with animals which makes good sense

As husbandry increase, and males do the husbandry, women's % contrib to agriculture drops, i.e., men also do more agriculture. The r2 here is quite good. Any other indep vars that might be predictive that are not already coded? Doug 15:10, 10 February 2010 (PST) I did this step for you since you were coming along nicely.

Also tested adding plow, which has no effect.
R2:final model R2:IV(distance) R2:IV(language) 
      0.2391168       0.9721403       0.9850290 
                Fstat           df pvalue
RESET           0.573     4439.743  0.449
Wald on restrs. 0.263    18727.343  0.608
NCV             2.128  1635098.385  0.145
SWnormal        0.650    89069.153  0.420
lagll           0.420 17233211.516  0.517
lagdd           0.822    10488.142  0.365
                mnb    fst          v  pval   VIF
(Intercept) -28.481  3.571 346042.425 0.059    NA
fyll          1.410 11.324  54004.615 0.001 1.633
fydd          0.373  1.557   4202.249 0.212 2.097
anim         -2.801  5.032  43145.334 0.025 1.427

plot language

lat<-SCCS$v833.1 lon<-SCCS$v833.2 plot(lon,lat, cex=.1) z=depvar ztxt=as.character(z) text(lon,lat,ztxt)

map

Jasiell map.jpg

Personal tools