Sparse modelling of cancer patients' survival based on genomic copy number alterations

J Biomed Inform. 2022 Apr:128:104025. doi: 10.1016/j.jbi.2022.104025. Epub 2022 Feb 16.

Abstract

Copy number alterations (CNA) are structural variation in the genome, in which some regions exhibit more or less than the normal two chromosomal copies. This genomic CNA profile provides critical information in tumour progression and is therefore informative for patients' survival. It is currently a statistical challenge to model patients' survival using their genomic CNA profiles while at the same time identify regions in the genome that are associated with patients' survival. Some methods have been proposed, including Cox proportional hazard (PH) model with ridge, lasso, or elastic net penalties. However, these methods do not take the general dependencies between genomic regions into account and produce results that are difficult to interpret. In this paper, we extend the elastic net penalty by introducing additional penalty that takes into account general dependencies between genomic regions. This new model produces smooth parameter estimates while simultaneously performs variable selection via sparse solution. The results indicate that the proposed method shows a better prediction performance than other models in our simulation study, while enabling us to investigate regions in the genome that are associated with the patients' survival with sensible interpretation. We illustrate the method using a real dataset from a lung cancer cohort and simulated data.

Keywords: Copy number alterations; Cox proportional hazard; Lung cancer; Regression; Sparse solution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • DNA Copy Number Variations*
  • Genomics / methods
  • Humans
  • Lung Neoplasms* / genetics
  • Proportional Hazards Models