Diagnosing problems with imputation models using posterior predictive checking: A simulation study using data from the Longitudinal Study of Australian Children

Lee, Katherine; Nguyen, Cattram; Carlin, John

Please use this identifier to cite or link to this item: https://hdl.handle.net/10620/17921

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Katherine	-
dc.contributor.author	Nguyen, Cattram	-
dc.contributor.author	Carlin, John	-
dc.date.accessioned	2019-04-13T03:39:29Z	en
dc.date.accessioned	2015-04-14T02:59:21Z	en
dc.date.available	2015-04-14T02:59:21Z	en
dc.date.issued	2013-08-27	-
dc.identifier.uri	https://hdl.handle.net/10620/17921	en
dc.identifier.uri	http://hdl.handle.net/10620/4102	en
dc.description.abstract	Aims Multiple imputation (MI) is gaining popularity as a strategy for handling missing data, but there is a scarcity of tools for checking the models that are used to generate the imputed data. Posterior predictive checking (PPC) has been recommended as a potential method for checking imputation models. PPC involves simulating “replicated” data from the posterior predictive distribution of the model under scrutiny. Model fit is assessed by examining whether the analysis from the observed data looks typical of results obtained from the replicates produced by the model. The aim of this study was to evaluate the performance of PPC as an imputation diagnostic. Using simulation methods, we examined whether PPC p-values could reliably flag imputation models that led to biased study results. Methods Our simulations were based on data from the Longitudinal Study of Australian Children (LSAC). We artificially created missingness in the LSAC dataset, and imputed the missing data using four different imputation models. The first model was considered to be the optimal model, based on recommendations for constructing imputation models in the literature. We deliberately misspecified the other three models to determine whether PPC was effective in flagging the misspecification. Model misspecifications included the omission of the outcome variable from the imputation model, the omission of auxiliary variables from the imputation model and the failure to de-skew variables prior to imputation. We varied the amount of missing data, the missing data models and the target analyses of interest, and evaluated the performance of PPC in diagnosing issues with the imputation models under these different scenarios. We used PPC p-values as our summary diagnostic measure, where extreme p-values (i.e. p-values close to 0 or 1) suggest a misfit between the model and the data. Results For all study scenarios, we found that PPC p-values became more extreme as bias and root mean square error in the MI results increased. This is a promising result, since PPC p-values were correctly flagging imputation models that were performing worse. However, as the amount of missing data increased, PPC p-values became less extreme (closer to 0.5). Thus, with larger amounts of missing data, PPC may have a reduced ability to detect misspecified models. Conclusion PPC appears to be a potentially valuable imputation diagnostic since the magnitude of the p-values correlated with MI performance. However, the PPC p-values are influenced by the proportion of missing data, with p-values becoming less extreme with increasing amounts of missing data. This is a drawback of the PPC diagnostic method, as users will typically be more concerned about their imputation results with larger amounts of missing data.	en
dc.subject	Surveys and Survey Methodology	en
dc.subject	Surveys and Survey Methodology -- Survey comparison	en
dc.subject.classification	Surveys and Survey Methodology	en
dc.title	Diagnosing problems with imputation models using posterior predictive checking: A simulation study using data from the Longitudinal Study of Australian Children	en
dc.type	Conference Presentations	en
dc.identifier.url	http://www.iscb2013.info/programme.html	en
dc.identifier.survey	LSAC	en
dc.description.keywords	Missing data	en
dc.description.keywords	Multiple imputation	en
dc.description.keywords	Posterior predictive checking	en
dc.description.conferencelocation	Munich, Germany	en
dc.description.conferencename	34th Annual Conference of the International Society for Clinical Biostatistics	en
dc.identifier.refereed	Yes	en
local.identifier.id	4601	en
dc.description.format	Oral presentation	en
dc.description.additionalinfo	Received student award to travel to Germany to present at the ISCB conference	en
dc.date.conferencestart	2013-08-25	-
dc.date.conferencefinish	2013-08-29	-
dc.date.presentation	2013-08-27	-
dc.subject.dss	Surveys and survey methodology	en
dc.subject.dssmaincategory	Surveys and Survey Methodology	en
dc.subject.dsssubcategory	Survey comparison	en
dc.subject.flosse	Surveys and Survey Methodology	en
dc.relation.survey	LSAC	en
dc.old.surveyvalue	LSAC	en
item.openairetype	Conference Presentations	-
item.cerifentitytype	Publications	-
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
Appears in Collections:	Conference Presentations

Show simple item record

Page view(s)

62

checked on May 7, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM