Nonparametric Mass Imputation for Data Integration

J Surv Stat Methodol. 2020 Nov 17;10(1):1-24. doi: 10.1093/jssam/smaa036. eCollection 2022 Feb.

Abstract

Data integration combining a probability sample with another nonprobability sample is an emerging area of research in survey sampling. We consider the case when the study variable of interest is measured only in the nonprobability sample, but comparable auxiliary information is available for both data sources. We consider mass imputation for the probability sample using the nonprobability data as the training set for imputation. The parametric mass imputation is sensitive to parametric model assumptions. To develop improved and robust methods, we consider nonparametric mass imputation for data integration. In particular, we consider kernel smoothing for a low-dimensional covariate and generalized additive models for a relatively high-dimensional covariate for imputation. Asymptotic theories and variance estimation are developed. Simulation studies and real applications show the benefits of our proposed methods over parametric counterparts.

Keywords: Approximate Bayesian; Generalized additive model; Hybrid bootstrap; Kernel smoothing; Missingness at random; Nonprobability sample.