Imputing data for Regression Analysis

From InterSciWiki
Jump to: navigation, search

Author(s) Douglas R. White, Anthon Eff, Malcolm Dow, 2009

Eff and Dow

0. Mac users Make Directory To download the R files for your analysis, if C:My Documents/MI does not already have the R files in it: Make directory C:My Documents/MI, and download to this subdirectory. Open "My computer", navigate to C:My Documents/MI, click 2SLS (the zip file), move cursor to within the zip file, press "control-A" to highlight all files, click "copy", place cursor within the C:My Documents/MI directory (outside the zip file) and press "control-paste." This should copy all the files needed for the programs in the C:My Documents/MI subdirectory.
click image to enlarge
1. Make the imputed datasets
2. Estimate model, combine results
SCCS Variables in R for your assignment
Human Social Complexity and World Cultures 2009 (2009 Course website)
Human_Social_Complexity_-_World_Cultures_Survey_2010#Overview (2010 Course website)
For an EduMod example of a student paper using a variant of this method see Sarah Baitzel's project - Finding a better model for Eff's study of Average Adult Female Contributions to Subsistence. This does not use the same method but a previous on developed by Anthon Eff.



EduMod for 2009 Classroom lab001 and PCs at home

EduMod for Winter 2010 Classroom lab002 and PCs at home

EduMod for Fall 2010 Classroom lab003 and PCs at home

click the image to see the program that produced 473-475mapValue_of_Boys
click the image to see the program that produced 474-476mapValue_of_Girls

EduMod for macs

fall 2010



... data for R programs

  1. Geographic proximity matrix (25 closest neighbors for each society)
  2. Linguistic proximity matrix

mix and mice R packages and data for R programs

Links to SPSS


White, Halbert. 1980. A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroscedasticity. Econometrica 48:817-838. See:
  • Anselin, L., Bera, A. K., Florax, R. and Yoon, M. J. 1996. Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26, 77–104.
  • Adalta software. 2009. SOLAS 3.0 for Missing Data Analysis. "SOLAS™ 3.0 for Missing Data Analysis offers principled approaches to missing data now has its own scripting language and features a choice of 6 imputation techniques, including 2 Multiple Imputation techniques based on the work of Prof. Donald B. Rubin. Data can be imported from a wide variety of file types including SAS (Unix/Windows), SPSS, Splus, Stata and many more. Once the data is imported, the missing data pattern can be displayed and a decision upon the most appropriate technique made. Once imputation is complete the imputed datasets can be analysed within SOLAS or exported to a variety of other packages in the correct format."
  • Das, D., Kelejian, H., & Prucha, I. (2003). Finite properties of estimators of spatial autoregressive models with autoregressive disturbances. Papers in Regional Science, 82, 1-26.
  • Farebrother, R. W. 1995. The Exact Distribution of the Lagrange Multiplier Test for Heteroskedasticity. Econometric Theory 11(04):803-804.
  • Kelejian, H., & Prucha, I. (1998). A generalized two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics, 17, 99-121.
  • Kelejian, H., Prucha, I., & Yuzefovich,Y. (2004). Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: Large and small sample results. In J. P. LeSage & R. Pace (Eds.), Spatial and spatiotemporal econometrics (Vol. 18, Advances in Econometrics, pp. 163-198). Oxford, UK: Elsevier.
  • Kelejian, H., & Robinson, D. (1993). A suggested method of estimation for spatial interdependent models with autocorrelated errors, and an application to a county expenditures model. Papers in Regional Science, 72, 297-312.
  • Land, K., & Deane, G. (1992). On the large-sample estimation of regression models with spatial or network-effects terms: A two-stage least squares approach. In P. Marsden (Ed.), Sociological methodology 1992 (pp. 221-248). Oxford, UK: Blackwell.
  • Ramsey, J.B. (1969) "Tests for Specification Errors in Classical Linear Least Squares Regression Analysis", J. Roy. Statist. Soc. B., 31(2), 350–371. See: Wikipedia:Ramsey_RESET_test
  • Rubin, Donald B. 1976. Inference and missing data Biometrika 1976 63(3):581-592; doi:10.1093/biomet/63.3.581
  • Rubin, Donald B. 1977. “[ Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys.” Journal of the American Statistical Association 72: 538-543.
  • See Also MCMC, MI, Chained Equations: Paul Zhang. 2003. Multiple Imputation: Theory and Method. Internat. Statist. Rev. 71(3): 581-592.
  • Rubin, Donald B. 1987. Multiple imputation for Nonresponse in Surveys. New York: John Wiley & Sons.
  • Rubin, Donald B. (1996) Multiple Imputation after 18+ Years (with discussion). Journal of the American Statistical Association 91: 473-489.
  • Rubin, Donald B. (1981), The Bayesian Bootstrap, The Annals of Statistics, 9, 130- 134.
  • Rubin, Donald B. >2000. Software for Multiple imputation Statistical Solutions.
  • Rubin, Donald B. 2004. The Design of a General and Flexible System for Handling Nonresponse in Sample Surveys. The American Statistician, 58(4): 298–302.
  • See also:
  • See also: Catherine Montalto and Sherman Hanna (post 1987) Can repeated-imputation inference (RII) techniques be used with nonlinear models? The short answer is YES, RII techniques are applicable to both linear and nonlinear models. The criteria for determining whether RII techniques are appropriate is independent of the functional form of the estimating equation. RII techniques are appropriate whenever the complete-data analysis inferences are based on estimates and standard errors. These estimates can include population means, variances, correlations, factor loadings, and regression coefficients.
  • Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)", Biometrika, Vol. 52, No. 3/4, pages 591–611. JSTOR: 2333709. See: Wikipedia:Shapiro–Wilk test