Robust Sample-Specific Stability Selection with Effective Error Control

J Comput Biol. 2019 Mar;26(3):202-217. doi: 10.1089/cmb.2018.0180. Epub 2019 Jan 14.

Abstract

Identifying individual characteristics is a crucial issue in personalized genome research. To effectively identify sample-specific characteristics, we propose a novel strategy called robust sample-specific stability selection. Although stability selection shows effective feature selection results and has attractive theoretical property (i.e., per-family error rate control), the method's results are sensitive to the value of the regularization parameter because the method performs feature selection based only on the particular parameter value that maximizes the selection probability. To resolve this issue, we propose robust stability selection and show that our method provides an effective theoretical property (i.e., effective per-family error rate control). We also propose a sample-specific random lasso based on the kernel-based L1-type regularization and weighted random sampling. The proposed robust sample-specific stability selection estimates the selection probabilities of variables using the sample-specific random lasso and then selects variables based on robust stability selection. Our method controls the effect of samples on sample-specific analysis by the two-stage strategy (i.e., the weighted random sampling and the kernel-based L1-type approach in the random lasso), and thus we can effectively perform sample-specific analysis without disturbances of samples having characteristics different from those of the target sample. We observe from the numerical studies that our strategies can effectively perform sample-specific analysis and provide biologically reliable results in gene selection.

Keywords: -type regularization; random lasso; sample-specific analysis; stability selection; varying coefficient model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genomics / methods*
  • Genomics / standards
  • Humans
  • Neoplasms / genetics
  • Patient-Specific Modeling*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards