# EduMod Mac-4: (your dep var here) Imputation and Regression

From InterSciWiki

Author(s) Douglas R. White, Anthon Eff, Malcolm Dow, 2009

## Contents

### Eff and Dow

- Eff, E. Anthon, and Malcolm Dow. 2009. How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research : A Primer for R. Structure and Dynamics: eJournal of Anthropological and Related Sciences 3#3 art 1. Previous draft in pdf

- 0. Mac users Make Directory
**To download the R files for your analysis**, if C:My Documents/MI does not already have the R files in it:**Make directory C:My Documents/MI**, and download http://intersci.ss.uci.edu/wiki/pw/Eff&Dow_data&programs.zip to this subdirectory. Open "My computer", navigate to C:My Documents/MI, click 2SLS (the zip file), move cursor to within the zip file, press "control-A" to highlight all files, click "copy", place cursor within the C:My Documents/MI directory (outside the zip file) and press "control-paste." This should copy all the files needed for the programs in the C:My Documents/MI subdirectory. - 1. Make the imputed datasets
- 2. Estimate model, combine results
- SCCS Variables in R for your assignment
- Human Social Complexity and World Cultures 2009 (Course website)
- For an EduMod example of a student paper using a variant of this method see Sarah Baitzel's project -
**Finding a better model for Eff's study of Average Adult Female Contributions to Subsistence**. This does not use the same method but a previous on developed by Anthon Eff.

- Dow, Malcolm M., and E. Anthon Eff. 2009b. Multiple Imputation of Missing Data in Cross-Cultural Samples. Cross-Cultural Research. 43(3):206-229.
- Dow, Malcolm M, and E. Anthon Eff. 2009a. “Cultural Trait Transmission and Missing Data as Sources of Bias in Cross-Cultural Survey Research: Explanations of Polygyny Re-examined.” Cross-Cultural Research. 43(2): 134-151.
- Eff, E. Anthon, and Malcolm M. Dow. 2008. “Do Markets Promote Prosocial Behavior? Evidence from the Standard Cross-Cultural Sample.” http://econpapers.repec.org/paper/mtswpaper/200803.htm.
- Eff, E. Anthon, and Malcolm M. Dow. 2009. Market integration and pro-social behavior. To appear in Robert C. Marshall, Editor. Cooperation in Economic and Social Life. Society for Economic Anthropology Monographs Vol 26. AltaMira Press: Walnut Creek, CA.
- Eff, E. Anthon. 2008. "Weight Matrices for Cultural Proximity: Deriving Weights from a Language Phylogeny." Structure and Dynamics: eJournal of Anthropological and Related Sciences 3(2), Article 9. http://repositories.cdlib.org/imbs/socdyn/sdeas/vol3/iss2/art9
- King, G., Honaker, J., Joseph, A., & Scheve, K. 2001. Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review 95: 49-69. Abstract and *.ps

## EduMod

- Copy the last unused label (in red) several times and correct the numeric series.
- Copy your code into your EduMod page
- Then Edit the (dep var) in the first unused label (in red), save, and click
- EduMod 0: Sum(valchild) Imputation and Regression
- EduMod 1: Max(valchild) Imputation and Regression
- EduMod 2: polygyny Imputation and Regression
- YOU COPY FROM EduMod-3 (just below) to your own page in this numbered list (dont use the last red line, rather, edit, copy the last line at the bottom, increment the number). Open EduMod 3, copay its contents. Return here with back arrow. Then Use the red line with the next number, paste your copy there. You can edit your copy, run it in R, post the results under the results header.
- EduMod-3: (sample dep var, copy from here) Imputation and Regression
- EduMod-4: (your dep var here) Imputation and Regression
- EduMod-5: (your dep var here) Imputation and Regression
- EduMod-6: (your dep var here) Imputation and Regression
- EduMod-7: (your dep var here) Imputation and Regression
- EduMod-8: (your dep var here) Imputation and Regression
- EduMod-9: (your dep var here) Imputation and Regression

### EduMod for macs

- EduMod Mac-3: (sample dep var, copy from here) Imputation and Regression NOT YET READY FOR PRIME TIME, MAC USERS!
**EduMod Mac-4: (your dep var here) Imputation and Regression**- EduMod Mac-5: (your dep var here) Imputation and Regression
- EduMod Mac-6: (your dep var here) Imputation and Regression
- EduMod Mac-7: (your dep var here) Imputation and Regression
- EduMod Mac-8: (your dep var here) Imputation and Regression
- EduMod Mac-9: (your dep var here) Imputation and Regression

## ... data for R programs

- For data to run programs 1 and 2, see Imputing_the_data#Eff_and_Dow above.
- The GIS data can be downloaded from http://frank.mtsu.edu/~eaeff/downloads/vaux.Rdata.
- The SCCS se data can be downloaded from http://frank.mtsu.edu/~eaeff/downloads/SCCS.Rdata.
- The R version of the Spss SCCS data is at http://eclectic.anthrosciences.org/~drwhite/courses/index.html
- The autocorrelation weighting data of the Eff and Dow (2009) data used in the program can be downloaded as follows:

- Geographic proximity matrix (25 closest neighbors for each society) http://frank.mtsu.edu/~eaeff/downloads/dist25wm.dta.
- Linguistic proximity matrix http://frank.mtsu.edu/~eaeff/downloads/langwm.dta.

**About this EduMod Page**- EduMod-0: Sum(valchild) Imputation and Regression
- R Program 1 from Eff and Dow (2009): Make the imputed datasets
- R Program 2 from Eff and Dow (2009): Estimate model, combine results

## mix and mice R packages and data for R programs

- mix Schafer, Joseph L 2007. mix: Estimation/multiple Imputation for Mixed Categorical and Continuous Data. R package version 1.0-6. http://www.stat.psu.edu/~jls/misoftwa.html
- mice Van Buuren, S. & C.G.M. Oudshoorn 2007. mice: Multivariate Imputation by Chained Equations. R package version 1.16. http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm
- These are used in Eff and Dow (2009).

## Links to SPSS

- Comparative research tools
- Standard Cross-Cultural Sample
- Human Social Complexity and World Cultures 2009 UCI course: fall
- SCCS index of variables for on-line data
- Anthon Eff
- Malcolm Dow
- Galton's problem and autocorrelation

## References

- White's robust variance-covariance matrix referred to on page 11:

- White, Halbert. 1980. A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity. Econometrica 48:817-838. See: http://support.sas.com/rnd/app/examples/ets/hetero/index.htm

- Adalta software. 2009. SOLAS 3.0 for Missing Data Analysis. "SOLAS™ 3.0 for Missing Data Analysis offers principled approaches to missing data now has its own scripting language and features a choice of 6 imputation techniques, including 2 Multiple Imputation techniques based on the work of Prof. Donald B. Rubin. Data can be imported from a wide variety of file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more. Once the data is imported, the missing data pattern can be displayed and a decision upon the most appropriate technique made. Once imputation is complete the imputed datasets can be analysed within SOLAS or exported to a variety of other packages in the correct format."
- Das, D., Kelejian, H., & Prucha, I. (2003).
**Finite properties of estimators of spatial autoregressive models with autoregressive disturbances**. Papers in Regional Science, 82, 1-26. - Farebrother, R. W. 1995. The Exact Distribution of the Lagrange Multiplier Test for Heteroskedasticity. Econometric Theory 11(04):803-804.
- Kelejian, H., & Prucha, I. (1998).
**A generalized two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances**. Journal of Real Estate Finance and Economics, 17, 99-121. - Kelejian, H., Prucha, I., & Yuzefovich,Y. (2004).
**Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: Large and small sample results**. In J. P. LeSage & R. Pace (Eds.), Spatial and spatiotemporal econometrics (Vol. 18, Advances in Econometrics, pp. 163-198). Oxford, UK: Elsevier.- Kelejian, H., Prucha, I. 1998.
**A Generalized Spatial Two Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances**. Journal of Real Estate Finance and Economics vol. 17, pp 99-121.

- Kelejian, H., Prucha, I. 1998.
- Kelejian, H., & Robinson, D. (1993). A suggested method of estimation for spatial interdependent models with autocorrelated errors, and an application to a county expenditures model. Papers in Regional Science, 72, 297-312.
- Land, K., & Deane, G. (1992).
**On the large-sample estimation of regression models with spatial or network-effects terms: A two-stage least squares approach**. In P. Marsden (Ed.), Sociological methodology 1992 (pp. 221-248). Oxford, UK: Blackwell. - Ramsey, J.B. (1969) "Tests for Specification Errors in Classical Linear Least Squares Regression Analysis", J. Roy. Statist. Soc. B., 31(2), 350–371. See: Wikipedia:Ramsey_RESET_test
- Rubin, Donald B. 1976. Inference and missing data Biometrika 1976 63(3):581-592; doi:10.1093/biomet/63.3.581
- Rubin, Donald B. 1977. “[ Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys.” Journal of the American Statistical Association 72: 538-543.
- See Also MCMC, MI, Chained Equations: Paul Zhang. 2003. Multiple Imputation: Theory and Method. Internat. Statist. Rev. 71(3): 581-592.
- Rubin, Donald B. 1987. Multiple imputation for Nonresponse in Surveys. New York: John Wiley & Sons.
- Rubin, Donald B. (1996) Multiple Imputation after 18+ Years (with discussion). Journal of the American Statistical Association 91: 473-489.
- Rubin, Donald B. (1981), The Bayesian Bootstrap, The Annals of Statistics, 9, 130- 134.
- Rubin, Donald B. >2000. Software for Multiple imputation Statistical Solutions.
- Rubin, Donald B. 2004. The Design of a General and Flexible System for Handling Nonresponse in Sample Surveys. The American Statistician, 58(4): 298–302.
- See also: http://www.ssc.upenn.edu/~allison/MultInt99.pdf
- See also: Catherine Montalto and Sherman Hanna (post 1987) Can repeated-imputation inference (RII) techniques be used with nonlinear models? The short answer is YES, RII techniques are applicable to both linear and nonlinear models. The criteria for determining whether RII techniques are appropriate is independent of the functional form of the estimating equation. RII techniques are appropriate whenever the complete-data analysis inferences are based on estimates and standard errors. These estimates can include population means, variances, correlations, factor loadings, and regression coefficients.

- Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)", Biometrika, Vol. 52, No. 3/4, pages 591–611. JSTOR: 2333709. See: Wikipedia:Shapiro–Wilk test