# Imputing data for Regression Analysis

From InterSciWiki

Author(s) Douglas R. White, Anthon Eff, Malcolm Dow, 2009

## Contents

### Eff and Dow

- Eff, E. Anthon, and Malcolm Dow. 2009. Eff&Dow2009.pdf - How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research: A Primer for R. Structure and Dynamics: eJournal of Anthropological and Related Sciences 3#3 art 1. Previous draft in pdf -- Imputing missing data

- 0. Mac users Make Directory
**To download the R files for your analysis**, if C:My Documents/MI does not already have the R files in it:**Make directory C:My Documents/MI**, and download http://intersci.ss.uci.edu/wiki/pw/Eff&Dow_data&programs.zip to this subdirectory. Open "My computer", navigate to C:My Documents/MI, click 2SLS (the zip file), move cursor to within the zip file, press "control-A" to highlight all files, click "copy", place cursor within the C:My Documents/MI directory (outside the zip file) and press "control-paste." This should copy all the files needed for the programs in the C:My Documents/MI subdirectory. - 1. Make the imputed datasets
- 2. Estimate model, combine results
- SCCS Variables in R for your assignment
- Human Social Complexity and World Cultures 2009 (2009 Course website)
- Human_Social_Complexity_-_World_Cultures_Survey_2010#Overview (2010 Course website)

- Dow, Malcolm M., and E. Anthon Eff. 2009b. Multiple Imputation of Missing Data in Cross-Cultural Samples. Cross-Cultural Research. 43(3):206-229.
- Dow, Malcolm M, and E. Anthon Eff. 2009a. “Cultural Trait Transmission and Missing Data as Sources of Bias in Cross-Cultural Survey Research: Explanations of Polygyny Re-examined.” Cross-Cultural Research. 43(2): 134-151.
- Eff, E. Anthon, and Malcolm M. Dow. 2008. “Do Markets Promote Prosocial Behavior? Evidence from the Standard Cross-Cultural Sample.” http://econpapers.repec.org/paper/mtswpaper/200803.htm.
- Eff, E. Anthon, and Malcolm M. Dow. 2009. Market integration and pro-social behavior. To appear in Robert C. Marshall, Editor. Cooperation in Economic and Social Life. Society for Economic Anthropology Monographs Vol 26. AltaMira Press: Walnut Creek, CA.
- Eff, E. Anthon. 2008. "Weight Matrices for Cultural Proximity: Deriving Weights from a Language Phylogeny." Structure and Dynamics: eJournal of Anthropological and Related Sciences 3(2), Article 9. http://repositories.cdlib.org/imbs/socdyn/sdeas/vol3/iss2/art9
- King, G., Honaker, J., Joseph, A., & Scheve, K. 2001. Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review 95: 49-69. Abstract and *.ps

- For an EduMod example of a student paper using a variant of this method see Sarah Baitzel's project -
**Finding a better model for Eff's study of Average Adult Female Contributions to Subsistence**. This does not use the same method but a previous on developed by Anthon Eff.

## EduMod

- Copy the last unused label (in red) several times and correct the numeric series.
- Copy your code into your EduMod page
- Then Edit the (dep var) in the first unused label (in red), save, and click
- EduMod 0: Sum(valchild) Imputation and Regression
- EduMod 1: Max(valchild) Imputation and Regression
- EduMod 2: polygyny Imputation and Regression
- YOU COPY FROM EduMod-3 (just below) to your own page in this numbered list (dont use the last red line, rather, edit, copy the last line at the bottom, increment the number). Open EduMod 3, copay its contents. Return here with back arrow. Then Use the red line with the next number, paste your copy there. You can edit your copy, run it in R, post the results under the results header.
- EduMod-3: (sample dep var, copy from here) Imputation and Regression
**You can edit and copy with Control-A from EduMod-4 not -3**- EduMod-4: (your dep var here) Imputation and Regression
- EduMod-5: (your dep var here) Imputation and Regression

### EduMod for 2009 Classroom lab001 and PCs at home

- Final Models and Commentary - User:Alexander George-Johnson
- (25 students active here on PCs - WHERE ARE THE OTHER FOUR?)
- EduMod-6: (your dep var here) Imputation and Regression
- EduMod-7: (your dep var here) Imputation and Regression
- EduMod-8: (your dep var here) Imputation and Regression
- EduMod-9: (your dep var here) Imputation and Regression
- EduMod-10: (your dep var here) Imputation and Regression
- EduMod-11: Imputation and Regression - User:Amanda McDonald
- EduMod-12: (your dep var here) Imputation and Regression
- EduMod-13: (your dep var here) Imputation and Regression
- EduMod-14: (your dep var here) Imputation and Regression
- EduMod-15: (your dep var here) Imputation and Regression
- EduMod-16: (your dep var here) Imputation and Regression
- EduMod-17: (your dep var here) Imputation and Regression
- EduMod-18: (your dep var here) Imputation and Regression
- EduMod-19: (your dep var here) Imputation and Regression
- EduMod-20: (your dep var here) Imputation and Regression
- EduMod-21: (your dep var here) Imputation and Regression
- EduMod-22: (your dep var here) Imputation and Regression
- EduMod-23: (your dep var here) Imputation and Regression
- EduMod-24: (your dep var here) Imputation and Regression
- EduMod-26: (your dep var here) Imputation and Regression
- EduMod-27: (your dep var here) Imputation and Regression
- EduMod-28: (your dep var here) Imputation and Regression
- EduMod-29: (your dep var here) Imputation and Regression
- EduMod-30: (your dep var here) Imputation and Regression
- EduMod-31: (your dep var here) Imputation and Regression
- EduMod-31: Imputation and Regression - Alex
- EduMod-32: Imputation and Regression - Alex
- EduMod-33: (your dep var here) Imputation and Regression
- EduMod-34: (your dep var here) Imputation and Regression
- EduMod-35: (your dep var here) Imputation and Regression
- EduMod-36: (your dep var here) Imputation and Regression
- EduMod-37: (your dep var here) Imputation and Regression

- EduMod-39: (your dep var here) Imputation and Regression
- EduMod-40: (your dep var here) Imputation and Regression
- EduMod-41: (your dep var here) Imputation and Regression
- EduMod-42: (your dep var here) Imputation and Regression Depvar=sexratio714 v714

**EduMod for Winter 2010 Classroom** lab002 and PCs at home

- copy full contents of a completed blue page into the first unused page ("take" or red link), add your name to mark your worksite. Save your page, edit the "before" and "after" links on you page and add your name there too
- Go to "Programs" and run R
- Do a test run copying the PROGRAM contents of EduMod-54 (or of your own page once you copy the test program there) and paste into R
- Talk:Human Social Complexity - World Cultures 2010
- EduMod-48: Imputation and Regression v679 - D. White 1-30-2010 warfight
- EduMod-49: Imputation and Regression v892 - D. White 1-29-2010 External war
- EduMod-50: Imputation and Regression v570 - D. White 1-30-2010 Fraternal interest groups
- EduMod-51: Imputation and Regression v2101 v1767 v1769 v17 - D. White 1-28-2010 re: Pryor
- EduMod-52: Imputation and Regression Polygyny - D. White 12-18-2009 - protected
- EduMod-53: Imputation and Regression - User:Douglas R. White - v882 sororal polygyny
- EduMod-54: Imputation and Regression - User:Douglas R. White - v880 polygyny = can copy to your page
- EduMod-31: Imputation and Regression - User:Alexander George-Johnson v892 external war
- EduMod-55: Imputation and Regression - User:CCrager v661 FemPartic
- EduMod-56: Imputation and Regression - User talk:GloriaM - v557 mythicfounders
- EduMod-57: Imputation and Regression - User:jasiellt - v821 PctFemContAg
- EduMod-58: Imputation and Regression - User:Bwilliams - v649 Theories of fate
- EduMod-59: Imputation and Regression - User:Tsalunga - v57 Money
- EduMod-60: Imputation and Regression - User:Ariana Keil - Your dep var and vNum
- EduMod-61: Imputation and Regression - User:Yourname - Your dep var and vNum
- EduMod-62: Imputation and Regression - User:Yourname - Your dep var and vNum
- EduMod-63: Imputation and Regression - User:Yourname - Your dep var and vNum
- EduMod-64: Imputation and Regression - User:LeonChoi - v678 famine
- EduMod-65: Imputation and Regression - User:Roblesn
**new-Police**- Your dep var and vNum - EduMod-66: Imputation and Regression - User:Dante Anton - v1721 wealthy
- EduMod-67: Imputation and Regression - User:EStanfield - v1189 evil eye
- EduMod-68: Imputation and Regression - User:Roblesn - v90 Police
- EduMod-69: Imputation and Regression - User:Yourname - Your dep var and vNum
- EduMod-70: Imputation and Regression - User:Flackj - v1472 FormalEd

**EduMod for Fall 2010 Classroom** lab003 and PCs at home

- EduMod71 - User:Douglas_R._White - depvar=sum(v473 & v475) valchild boys
- EduMod72 - User:Douglas_R._White - depvar=sum(v860) general polygyny
- EduMod73 - User:Douglas_R._White - depvar=sum(v474 & v477) valchild girls
- EduMod74 - User:Douglas_R._White - depvar=sum(v473-v476) valchild
- EduMod75 - User:Douglas_R._White - depvar=sum(v473-v476) valchild
- EduMod76 - User:Your_Login_Name - depvar=sum(v473-v476) valchild
- EduMod77 - User:Your_Login_Name - depvar=sum(v473-v476) valchild
- EduMod78 - D White - Evil eye
- EduMod79 - D White - Money
- EduMod80 - D White - Money - not Moral gods
- EduMod81 - User:Tolga Oztan - EvilEye
- EduMod82 - D White - CaststratLGd (Caste stratification logged)
- EduMod83 - D White - Milking
- EduMod84 - D White
- EduMod85 - D White Maximizing in Jajmaniland: A Model of Caste Relations
- EduMod86 - D White More restricted variables: A Model of Caste Relations
- EduMod87 - D White
- EduMod88 - D White

### EduMod for macs

#### fall 2010

- SCCS R package - fall 2010 (new)
- Edu-Mod 2009-10: The Individual Studies from 2009 and winter 2010
- If using R commander in a linux netbook, some packages are included, others can be loaded as in, for example:

- install.packages("mice")
- install.packages("tripak")

- Nonworking *Rccs* models
- CreateModelValueSDWchildren.R - Working *Rccs* models#Value_of_children
- CreateModelDRWpolygyny.R - Working *Rccs* models#General polygyny with few variables DRW
- CreateModelDRWpolygyny2.R - Working *Rccs* models#General polygyny with more variables DRW

#### 2009

- Downloading Packages for Mac
- Mac version working, see: Mac users Make Directory see: winter 2010
- EduMod Mac-10: Imputation and Regression Ari Aszmuilo
- EduMod Mac-3: (sample dep var, copy from here) Imputation and Regression NOT YET READY FOR PRIME TIME, MAC USERS!
- EduMod Mac-4: (your dep var here) Imputation and Regression MACFOLKS - please use the classroom PCs until I get this fixed Doug 20:15, 1 October 2009 (PDT)
- EduMod Mac-5: (your dep var here) Imputation and Regression fall 2009
- EduMod Mac-6: (your dep var here) Imputation and Regression fall 2009
- EduMod Mac-7: (your dep var here) Imputation and Regression fall 2009
- EduMod Mac-8: (your dep var here) Imputation and Regression fall 2009
- EduMod Mac-9: (your dep var here) Imputation and Regression fall 2009

## ... data for R programs

- For data to run programs 1 and 2, see Imputing_the_data#Eff_and_Dow above.
- The GIS data can be downloaded from http://frank.mtsu.edu/~eaeff/downloads/vaux.Rdata.
- The SCCS data can be downloaded from http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata.
- The R version of the Spss SCCS data is at http://eclectic.anthrosciences.org/~drwhite/courses/index.html
- The autocorrelation weighting data of the Eff and Dow (2009) data used in the program can be downloaded as follows:

- Geographic proximity matrix (25 closest neighbors for each society) http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta.
- Linguistic proximity matrix http://frank.mtsu.edu/~eaeff/downloads/langwm.dta.

**About this EduMod Page**- EduMod-0: Sum(valchild) Imputation and Regression
- R Program 1 from Eff and Dow (2009): Make the imputed datasets
- R Program 2 from Eff and Dow (2009): Estimate model, combine results

## mix and mice R packages and data for R programs

- mix Schafer, Joseph L 2007. mix: Estimation/multiple Imputation for Mixed Categorical and Continuous Data. R package version 1.0-6. http://www.stat.psu.edu/~jls/misoftwa.html
- mice Van Buuren, S. & C.G.M. Oudshoorn 2007. mice: Multivariate Imputation by Chained Equations. R package version 1.16. http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm
- These are used in Eff and Dow (2009).

## Links to SPSS

- Comparative research tools
- Standard Cross-Cultural Sample
- Human Social Complexity and World Cultures 2009 UCI course: fall
- SCCS index of variables for on-line data
- Anthon Eff
- Malcolm Dow
- Galton's problem and autocorrelation

## References

- Halbert White's robust variance-covariance matrix referred to on page 11:

- White, Halbert. 1980. A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity. Econometrica 48:817-838. See: http://support.sas.com/rnd/app/examples/ets/hetero/index.htm

- Anselin, L., Bera, A. K., Florax, R. and Yoon, M. J. 1996. Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26, 77–104.
- Adalta software. 2009. SOLAS 3.0 for Missing Data Analysis. "SOLAS™ 3.0 for Missing Data Analysis offers principled approaches to missing data now has its own scripting language and features a choice of 6 imputation techniques, including 2 Multiple Imputation techniques based on the work of Prof. Donald B. Rubin. Data can be imported from a wide variety of file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more. Once the data is imported, the missing data pattern can be displayed and a decision upon the most appropriate technique made. Once imputation is complete the imputed datasets can be analysed within SOLAS or exported to a variety of other packages in the correct format."
- Das, D., Kelejian, H., & Prucha, I. (2003).
**Finite properties of estimators of spatial autoregressive models with autoregressive disturbances**. Papers in Regional Science, 82, 1-26. - Farebrother, R. W. 1995. The Exact Distribution of the Lagrange Multiplier Test for Heteroskedasticity. Econometric Theory 11(04):803-804.
- Kelejian, H., & Prucha, I. (1998).
**A generalized two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances**. Journal of Real Estate Finance and Economics, 17, 99-121. - Kelejian, H., Prucha, I., & Yuzefovich,Y. (2004).
**Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: Large and small sample results**. In J. P. LeSage & R. Pace (Eds.), Spatial and spatiotemporal econometrics (Vol. 18, Advances in Econometrics, pp. 163-198). Oxford, UK: Elsevier.- Kelejian, H., Prucha, I. 1998.
**A Generalized Spatial Two Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances**. Journal of Real Estate Finance and Economics vol. 17, pp 99-121.

- Kelejian, H., Prucha, I. 1998.
- Kelejian, H., & Robinson, D. (1993). A suggested method of estimation for spatial interdependent models with autocorrelated errors, and an application to a county expenditures model. Papers in Regional Science, 72, 297-312.
- Land, K., & Deane, G. (1992).
**On the large-sample estimation of regression models with spatial or network-effects terms: A two-stage least squares approach**. In P. Marsden (Ed.), Sociological methodology 1992 (pp. 221-248). Oxford, UK: Blackwell. - Ramsey, J.B. (1969) "Tests for Specification Errors in Classical Linear Least Squares Regression Analysis", J. Roy. Statist. Soc. B., 31(2), 350–371. See: Wikipedia:Ramsey_RESET_test
- Rubin, Donald B. 1976. Inference and missing data Biometrika 1976 63(3):581-592; doi:10.1093/biomet/63.3.581
- Rubin, Donald B. 1977. “[ Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys.” Journal of the American Statistical Association 72: 538-543.
- See Also MCMC, MI, Chained Equations: Paul Zhang. 2003. Multiple Imputation: Theory and Method. Internat. Statist. Rev. 71(3): 581-592.
- Rubin, Donald B. 1987. Multiple imputation for Nonresponse in Surveys. New York: John Wiley & Sons.
- Rubin, Donald B. (1996) Multiple Imputation after 18+ Years (with discussion). Journal of the American Statistical Association 91: 473-489.
- Rubin, Donald B. (1981), The Bayesian Bootstrap, The Annals of Statistics, 9, 130- 134.
- Rubin, Donald B. >2000. Software for Multiple imputation Statistical Solutions.
- Rubin, Donald B. 2004. The Design of a General and Flexible System for Handling Nonresponse in Sample Surveys. The American Statistician, 58(4): 298–302.
- See also: http://www.ssc.upenn.edu/~allison/MultInt99.pdf
- See also: Catherine Montalto and Sherman Hanna (post 1987) Can repeated-imputation inference (RII) techniques be used with nonlinear models? The short answer is YES, RII techniques are applicable to both linear and nonlinear models. The criteria for determining whether RII techniques are appropriate is independent of the functional form of the estimating equation. RII techniques are appropriate whenever the complete-data analysis inferences are based on estimates and standard errors. These estimates can include population means, variances, correlations, factor loadings, and regression coefficients.

- Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)", Biometrika, Vol. 52, No. 3/4, pages 591–611. JSTOR: 2333709. See: Wikipedia:Shapiro–Wilk test