Reference-free transcriptome signatures for prostate cancer prognosis

BMC Cancer. 2021 Apr 12;21(1):394. doi: 10.1186/s12885-021-08021-1.

Abstract

Background: RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data.

Methods: In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset.

Results: We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures.

Conclusions: Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.

Keywords: Prostate cancer signature; Reference-free transcriptomic; Supervised learning.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor*
  • Computational Biology / methods
  • Gene Expression Profiling
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Male
  • Prognosis
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / mortality*
  • Prostatic Neoplasms / pathology
  • Recurrence
  • Reproducibility of Results
  • Supervised Machine Learning
  • Transcriptome*

Substances

  • Biomarkers, Tumor