Monte Carlo simulations for evaluating the performance of modern missing data techniques when estimating structural equation models with latent variables. A systematic analysis of different types of multiple imputation.
Final Report Abstract
Utilizing Monte Carlo simulation techniques, this research project evaluates the performance of different variants of Multiple Imputation (MI) and other Non-Multiple Imputation methods for estimating missing values when analyzing structural equation models (SEM). In total, six miss- ing data techniques (MDTs) applied to three different SEM-population models were investi- gated using various configurations for simulation. These configurations included a) data files with different numbers of cases, b) data files with symmetrical and (strong) asymmetrical value distributions, and c) data files with different proportions of missing data. Besides using MI tech- niques with strict assumptions of value distributions (multivariate normal distribution), we also tested MI variants which are not subject to these assumptions explicitly considering nonnormal value distributions into account (e.g. categorical variables). For comparative reasons two Non- MI MDTs were applied (the “Direct Maximum Likelihood estimation” and the “Expectation Max- imization method”). For evaluating the performance of all six MDTs we focused on four different fit indices used most prominently in SEM analysis (p-value of the chi²-statistic, SRMR, RMSEA und CFI). We also analyzed the quality of all estimated SEM parameters and their standard errors as well as the relative efficiency of all estimated parameters. Among the six tested missing data techniques only two techniques could be identified that deliver very good results under all model- and data configurations. These are the „Direct Max- imum Likelihood estimation“ (Direct-ML-method) and a variant of MI that takes into account the model structure of the analyzed model when imputing the missing values: the H0-method. Both methods deliver high quality fit indices when applied to SEM estimation. They also deliver unbiased SEM parameter estimations and standard errors. Thus, when looking particularly at the MI variants, just the H0-method can be recommended for practical usage. In addition, the Direct ML-method (a Non-MI technique) can be recommended. The Direct ML-method inte- grates the process of estimating missing values into model estimation so that there is no need for an initially separate process of missing value imputation. Although all the other MDTs deliver good and unbiased results when estimating SEM pa- rameters and standard errors, they often generate SEM fit indices that lead to false rejections of SEMs. This is even more problematic when having data files with high proportions of missing values (≥ 20%). In situations like this, there is only one SEM fit index that can be fully recom- mended: the SRMR index (Standardized Root Mean-Square Residual index).
Publications
- Verfahren der Multiplen Imputation bei Schätzung von Strukturgleichungsmodellen mit latenten Variablen. Ein systematischer Vergleich mittels Monte-Carlo-Simulationen. (SISS – Schriftenreihe des Instituts für Sozialwissenschaften der Universität Stutt-gart, 2020, Nr. 50).
Wahl, Andreas, Dieter Urban