Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso

BMC Bioinformatics. 2023 Jan 23;24(1):25. doi: 10.1186/s12859-023-05143-0.

Abstract

In clinical trials, identification of prognostic and predictive biomarkers has became essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso, that integrates prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso and other extensions on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic and proteomic data.

Keywords: Genomic data; Highly correlated predictors; Variable selection.

MeSH terms

  • Biomarkers
  • Biomarkers, Tumor*
  • Genomics
  • Humans
  • Models, Statistical
  • Prognosis
  • Proteomics*

Substances

  • Biomarkers
  • Biomarkers, Tumor