Please use this identifier to cite or link to this item:
https://hdl.handle.net/10620/17845
Longitudinal Study: | LSAC | Title: | Diagnostic methods for checking multiple imputation models | Authors: | Nguyen, Cattram | Institution: | University of Melbourne | Publication Date: | 15-Feb-2015 | Keywords: | multiple imputation posterior predictive checking diagnostics model checking missing data |
Abstract: | Multiple imputation is an increasingly popular method for handling missing data. A key task in the imputation process is the specification of a model for generating imputations. The validity of imputation-based inferences depends on the adequacy of this imputation model. Constructing imputation models is not straightforward and requires careful decision-making. The imputer must decide, for example, which variables to include in the imputation model and what functional form these variables should take. In many cases, there is no consensus in the literature to inform the modelling decisions. If the imputation model is poorly specified, such as through the omission of important variables, this can lead to biased results. It is therefore important that researchers check the goodness-of-fit of their imputation models. Despite the popularity of multiple imputation, the checking of imputation models is not widespread. This may primarily be due to the scarcity of guidelines and computational tools for performing imputation diagnostics. Although some diagnostic methods have been proposed in the literature, very few studies have formally evaluated whether the proposed techniques are useful for identifying problems with imputation models. Thus, we have found ourselves in an environment where the wide availability of multiple imputation is coupled with a lack of software and guidelines for assessing the adequacy of the models used in this process. The current research addressed this knowledge gap by evaluating diagnostic methods for checking imputation models. This included an examination of proposed methods including graphical diagnostics, the Kolmogorov-Smirnov test and posterior predictive checking. These techniques were evaluated using simulation experiments and they were illustrated using data from the Longitudinal Study of Australian Children. The investigations in this thesis revealed both advantages and disadvantages of all evaluated diagnostics. The graphical checks were useful for exploring the imputed values, but it was challenging to apply them routinely to all imputed variables, especially when working on large-scale datasets. The Kolmogorov-Smirnov diagnostic was straightforward to implement, but it had limited usefulness when the data were missing at random, an assumption which is commonly made when performing multiple imputation. Posterior predictive checking was preferable to methods that focus on the plausibility of imputations, because it checks the fit of the model with respect to quantities of scientific interest. Posterior predictive checking was able to successfully identify model misspecifications such as the omission of the outcome variable from the imputation model. However, users of posterior predictive checking need to be aware of the shortcomings of this approach, particularly its reduced usefulness in the presence of large amounts of missing data. Given that all of the evaluated methods were imperfect, there is the need for further development and evaluation of diagnostic techniques for checking imputation models. In addition, rather than expecting any individual diagnostic to provide a complete solution, it might be preferable to treat each of these techniques as separate elements of a diagnostic toolkit. When applied together, these diagnostic methods can be used to check different aspects of the imputation model. Finally, to encourage the practice of model checking, it is crucial that guidelines for imputation diagnostics are developed and communicated to the research community. It is also important that tools for diagnostic checking are made available in statistical packages that support multiple imputation. This will become increasingly important as multiple imputation becomes further established as a standard missing data method into the future. | URL: | https://minerva-access.unimelb.edu.au/handle/11343/45205 | Keywords: | Surveys and Survey Methodology | Research collection: | Theses and student dissertations |
Appears in Collections: | Theses and student dissertations |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.