Computational efficiency of multiple kernel imputation
About this project
Many datasets are incomplete, typically nonresponse in surveys, but one always has to account for the missingness at the analysis phase. If the missingness is taken care of in an insufficient way, the error, in particular bias, may become large. At the same time, various information may be known that are associated with the missingness and the analysis variables, very most importantly auxiliary variables observed in the sample or the population, but also information on the population level.
In this project an imputation method called 'Multiple kernel imputation' is further developed. Complex relationships in the data may be hard to model parametrically, both among variables with and without missing values. This suggest usage of sparse assumptions and to let the data speak for itself, which the method achieves by only imputing real observed values, allowing for preservation of multivariate relationships among variables with missingness. The limitations of only using real donors is counteracted mainly by features from kernel estimation, but the computational demand becomes relatively large. The aim of this project is therefore to increase the computational efficiency of the method, and to make it more accessible to the statistical community by making it available in a supplementary package for the open-source software R.
The method has previously been shown to possess excellent statistical properties when the analysis variables are nonlinear functions of the auxiliary variables, see:
Pettersson, N. (2013). Multiple Kernel Imputation – A Locally Balanced Real Donor Method. PhD Thesis, Stockholm University.