Covariance-Insured Screening

Kevin He; Jian Kang; Hyokyoung G Hong; Ji Zhu; Yanming Li; Huazhen Lin; Han Xu; Yi Li

doi:10.1016/j.csda.2018.09.001

Covariance-Insured Screening

Comput Stat Data Anal. 2019 Apr:132:100-114. doi: 10.1016/j.csda.2018.09.001. Epub 2018 Sep 22.

Authors

Kevin He¹, Jian Kang¹, Hyokyoung G Hong², Ji Zhu³, Yanming Li¹, Huazhen Lin⁴, Han Xu³, Yi Li¹

Affiliations

¹ Department of Biostatistics, School of Public Health, University of Michigan.
² Department of Statistics and Probability, Michigan State University.
³ Department of Statistics, University of Michigan.
⁴ School of Statistics, Southwestern University of Finance and Economics.

Abstract

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss weak signals. By incorporating the inter-feature dependence, a covariance-insured screening approach is proposed to identify predictors that are jointly informative but marginally weakly associated with outcomes. The validity of the method is examined via extensive simulations and a real data study for selecting potential genetic factors related to the onset of multiple myeloma.

Keywords: Covariance-insured screening; Dimensionality reduction; High-dimensional data; Variable selection.

Abstract

Grants and funding