Sample-wise Combined Missing Effect Model with Penalization

J Comput Graph Stat. 2023;32(1):263-274. doi: 10.1080/10618600.2022.2070172. Epub 2022 May 26.

Abstract

Modern high-dimensional statistical inference often faces the problem of missing data. In recent decades, many studies have focused on this topic and provided strategies including complete-sample analysis and imputation procedures. However, complete-sample analysis discards information of incomplete samples, while imputation procedures have accumulative errors from each single imputation. In this paper, we propose a new method, Sample-wise COmbined missing effect Model with penalization (SCOM), to deal with missing data occurring in predictors. Instead of imputing the predictors, SCOM estimates the combined effect caused by all missing data for each incomplete sample. SCOM makes full use of all available data. It is robust with respect to various missing mechanisms. Theoretical studies show the oracle inequality for the proposed estimator, and the consistency of variable selection and combined missing effect selection. Simulation studies and an application to the Residential Building Data also illustrate the effectiveness of the proposed SCOM.

Keywords: Imputation; Lasso; Linear regression; Missing data; Ridge regression.