Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis

Genet Epidemiol. 2019 Apr;43(3):276-291. doi: 10.1002/gepi.22194. Epub 2019 Feb 11.

Abstract

In cancer genomic studies, an important objective is to identify prognostic markers associated with patients' survival. Network-based regularization has achieved success in variable selections for high-dimensional cancer genomic data, because of its ability to incorporate the correlations among genomic features. However, as survival time data usually follow skewed distributions, and are contaminated by outliers, network-constrained regularization that does not take the robustness into account leads to false identifications of network structure and biased estimation of patients' survival. In this study, we develop a novel robust network-based variable selection method under the accelerated failure time model. Extensive simulation studies show the advantage of the proposed method over the alternative methods. Two case studies of lung cancer datasets with high-dimensional gene expression measurements demonstrate that the proposed approach has identified markers with important implications.

Keywords: high-dimensional data; lung cancer prognosis; network-based regularization; penalized estimation; robust variable selection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Gene Expression Regulation, Neoplastic
  • Gene Regulatory Networks*
  • Genome, Human
  • Genomics*
  • Humans
  • Lung Neoplasms / diagnosis*
  • Lung Neoplasms / genetics*
  • Models, Genetic
  • Prognosis