Please use this identifier to cite or link to this item:
Longitudinal Study: LSAC
Title: Diagnosing problems with imputation models using posterior predictive checking: A simulation study using data from the Longitudinal Study of Australian Children
Authors: Lee, Katherine 
Nguyen, Cattram 
Carlin, John 
Publication Date: 27-Aug-2013
Keywords: Missing data
Multiple imputation
Posterior predictive checking
Abstract: Aims Multiple imputation (MI) is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking the models that are used to generate the imputed data. Posterior predictive checking (PPC) has been recommended as a potential method for checking imputation models. PPC involves simulating “replicated” data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data looks typical of results obtained from the replicates produced by the model. The aim of this study was to evaluate the performance of PPC as an imputation diagnostic. Using simulation methods, we examined whether PPC p-values could reliably flag imputation models that led to biased study results. Methods Our simulations were based on data from the Longitudinal Study of Australian Children (LSAC). We artificially created missingness in the LSAC dataset, and imputed the missing data using four different imputation models. The first model was considered to be the optimal model, based on recommendations for constructing imputation models in the literature. We deliberately misspecified the other three models to determine whether PPC was effective in flagging the misspecification. Model misspecifications included the omission of the outcome variable from the imputation model, the omission of auxiliary variables from the imputation model and the failure to de-skew variables prior to imputation. We varied the amount of missing data, the missing data models and the target analyses of interest, and evaluated the performance of PPC in diagnosing issues with the imputation models under these different scenarios. We used PPC p-values as our summary diagnostic measure, where extreme p-values (i.e. p-values close to 0 or 1) suggest a misfit between the model and the data. Results For all study scenarios, we found that PPC p-values became more extreme as bias and root mean square error in the MI results increased. This is a promising result, since PPC p-values were correctly flagging imputation models that were performing worse. However, as the amount of missing data increased, PPC p-values became less extreme (closer to 0.5). Thus, with larger amounts of missing data, PPC may have a reduced ability to detect misspecified models. Conclusion PPC appears to be a potentially valuable imputation diagnostic since the magnitude of the p-values correlated with MI performance. However, the PPC p-values are influenced by the proportion of missing data, with p-values becoming less extreme with increasing amounts of missing data. This is a drawback of the PPC diagnostic method, as users will typically be more concerned about their imputation results with larger amounts of missing data.
Conference: 34th Annual Conference of the International Society for Clinical Biostatistics
Conference location: Munich, Germany
Keywords: Surveys and Survey Methodology; Surveys and Survey Methodology -- Survey comparison
Research collection: Conference Presentations
Appears in Collections:Conference Presentations

Show full item record

Page view(s)

checked on Dec 4, 2023
Google icon

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.