Project Details
Projekt Print View

Robust and efficient multiple imputation of complex data sets

Subject Area Empirical Social Research
Term from 2010 to 2012
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 162411054
 
Final Report Year 2014

Final Report Abstract

The results of the project work up to now can be summarized as follows. • The method of multiple imputation to compensate for missing values is theoretically justified and feasible in many situations where other theoretically justified methods are not available or are not feasible. • With the exception of PMM, Ml as currently implemented in available software, does not account for nonlinear relationships between the variables to be imputed and covariates. The only cases, where available software can safely be applied is when the metric variables with missing values are known to be jointly normally distributed (norm; Schafer, 1997) or if only one metric variable is to be imputed (e.g., norm, mice, IVEware). In all other cases the imputation model for metric variables is misspecified. This misspecification may or may not lead to invalid inferences in the analyses of scientific interest, depending on various factors. Broadly speaking, the generation of multiple imputations in the case of misspecified imputation models seems to be robust if the imputation model is only slightly misspecified. However, if the misspecification is more severe, then inferences are invalid. Unfortunately, in applications it is hardly known whether the possible misspecification is only minor or not. On the other hand, PMM fails if k, the number of nearest neighbors, is not correctly specified. • The results of our work imply that the non- and semiparametric methods are superior to those currently available, in that they are more flexible. They automatically account for nonlinearities, while the loss in efficiency is only minor if the relationships are in fact linear. • However, the proposed method is no panacea, too. A limit of the proposed methods is that high-dimensional semiparametric imputation models will fail due to the curse of dimensionality. Thus, we still have to assume an additive structure of the imputation model. However, the methods proposed are nevertheless an improvement over existing methods. They allow the generation of proper imputations in situations in which standard imputation methods fail. • Another problem that has not been picked out as an important topic in the Ml community is how to generate proper imputations in multivalued imputation problems. Although it may not be the rule that a covariate with missing vales has an, e.g., U-shaped relationship with the dependent variable in the model of scientific interest, a small selection of articles suggest that such a relationship is not uncommon. • Although Rubin's theory (1987) is based on the use of (asymptotic) efficient or at least self-efficient estimators, preliminary simulation results suggest that Ml may work for semi-parametric estimators which are not (asymptotically) efficient. • Adopting mixed effects models to generate imputations in case of binary variables to be imputed seems feasible, although the distributional assumptions need probably to be weakened. • As many covariates as possible should be included in the imputation model in a least restrictive way, as leaving out important variables or misspecifying the functional form overstrains the 'self-correcting' property of Ml. On the other hand, including too many variables may lead to multicollinearity and large standard errors in the final analyses of interest. Given the last point, work in the remaining months of the ongoing project will be devoted to the implementation and testing of a method to reduce the number of covariate terms in the imputation model to those that substantially improve the predictions. Then, the evaluation of the estimated variances of the finally adopted estimator will probably be an issue.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung