Robust approach for variable selection with high dimensional longitudinal data analysis

Stat Med. 2021 Dec 30;40(30):6835-6854. doi: 10.1002/sim.9213. Epub 2021 Oct 7.

Abstract

This article proposes a new robust smooth-threshold estimating equation to select important variables and automatically estimate parameters for high dimensional longitudinal data. A novel working correlation matrix is proposed to capture correlations within the same subject. The proposed procedure works well when the number of covariates pn increases as the number of subjects n increases. The proposed estimates are competitive with the estimates obtained with the true correlation structure, especially when the data are contaminated. Moreover, the proposed method is robust against outliers in the response variables and/or covariates. Furthermore, the oracle properties for robust smooth-threshold estimating equations under "large n, diverging pn " are established under some regularity conditions. Extensive simulation studies and a yeast cell cycle data are used to evaluate the performance of the proposed method, and results show that the proposed method is competitive with existing robust variable selection procedures.

Keywords: Tukey's biweight method; automatic variable selection; high dimensional covariates; outliers; robustness; working correlation structure.

MeSH terms

  • Computer Simulation
  • Data Analysis*
  • Humans
  • Models, Statistical*
  • Research Design