Blood-based transcriptomic signature panel identification for cancer diagnosis: benchmarking of feature extraction methods

Brief Bioinform. 2022 Sep 20;23(5):bbac315. doi: 10.1093/bib/bbac315.

Abstract

Liquid biopsy has shown promise for cancer diagnosis due to its minimally invasive nature and the potential for novel biomarker discovery. However, the low concentration of relevant blood-based biosources and the heterogeneity of samples (i.e. the variability of relative abundance of molecules identified), pose major challenges to biomarker discovery. Moreover, the number of molecular measurements or features (e.g. transcript read counts) per sample could be in the order of several thousand, whereas the number of samples is often substantially lower, leading to the curse of dimensionality. These challenges, among others, elucidate the importance of a robust biomarker panel identification or feature extraction step wherein relevant molecular measurements are identified prior to classification for cancer detection. In this work, we performed a benchmarking study on 12 feature extraction methods using transcriptomic profiles derived from different blood-based biosources. The methods were assessed both in terms of their predictive performance and the robustness of the biomarker panels in diagnosing cancer or stratifying cancer subtypes. While performing the comparison, the feature extraction methods are categorized into feature subset selection methods and transformation methods. A transformation feature extraction method, namely partial least square discriminant analysis, was found to perform consistently superior in terms of classification performance. As part of the benchmarking study, a generic pipeline has been created and made available as an R package to ensure reproducibility of the results and allow for easy extension of this study to other datasets (https://github.com/VafaeeLab/bloodbased-pancancer-diagnosis).

Keywords: biomarker discovery; feature extraction; feature selection; liquid biopsy; transcriptomics.

MeSH terms

  • Algorithms
  • Benchmarking
  • Biomarkers
  • Humans
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics
  • Reproducibility of Results
  • Transcriptome*

Substances

  • Biomarkers