Optimal microRNA Sequencing Depth to Predict Cancer Patient Survival with Random Forest and Cox Models

Genes (Basel). 2022 Dec 2;13(12):2275. doi: 10.3390/genes13122275.

Abstract

(1) Background: tumor profiling enables patient survival prediction. The two essential parameters to be calibrated when designing a study based on tumor profiles from a cohort are the sequencing depth of RNA-seq technology and the number of patients. This calibration is carried out under cost constraints, and a compromise has to be found. In the context of survival data, the goal of this work is to benchmark the impact of the number of patients and of the sequencing depth of miRNA-seq and mRNA-seq on the predictive capabilities for both the Cox model with elastic net penalty and random survival forest. (2) Results: we first show that the Cox model and random survival forest provide comparable prediction capabilities, with significant differences for some cancers. Second, we demonstrate that miRNA and/or mRNA data improve prediction over clinical data alone. mRNA-seq data leads to slightly better prediction than miRNA-seq, with the notable exception of lung adenocarcinoma for which the tumor miRNA profile shows higher predictive power. Third, we demonstrate that the sequencing depth of RNA-seq data can be reduced for most of the investigated cancers without degrading the prediction abilities, allowing the creation of independent validation sets at a lower cost. Finally, we show that the number of patients in the training dataset can be reduced for the Cox model and random survival forest, allowing the use of different models on different patient subgroups.

Keywords: Cox model; cancer; microRNA; random survival forest model; sequencing depth; survival.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling
  • Humans
  • Lung Neoplasms* / genetics
  • MicroRNAs* / genetics
  • Proportional Hazards Models
  • RNA, Messenger / genetics
  • Random Forest

Substances

  • MicroRNAs
  • RNA, Messenger

Grants and funding

This article was developed in the framework of the Grenoble Alpes Data Institute, supported by the French National Research Agency under the Investissements d’avenir programme (ANR-15-IDEX-02).