PRPS-ST: A protocol-agnostic self-training method for gene expression-based classification of blood cancers

Blood Cancer Discov. 2020 Nov;1(3):244-257. doi: 10.1158/2643-3230.BCD-20-0076. Epub 2020 Sep 10.

Abstract

Gene expression classifiers are gaining increasing popularity for stratifying tumors into subgroups with distinct biological features. A fundamental limitation shared by current classifiers is the requirement for comparable training and testing data sets. Here, we describe a self-training implementation of our probability ratio-based classification prediction score method (PRPS-ST), which facilitates the porting of existing classification models to other gene expression data sets. In comparison to gold standards, we demonstrate favorable performance of PRPS-ST in gene expression-based classification of DLBCL and B-ALL using a diverse variety of gene expression data types and pre-processing methods, including in classifications with a high degree of class imbalance. Tumors classified by our method were significantly enriched for prototypical genetic features of their respective subgroups. Interestingly, this included cases that were unclassifiable by established methods, implying the potential enhanced sensitivity of PRPS-ST.

Keywords: B-ALL; binary classifier; hematologic; machine learning; molecular subgroup.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression
  • Hematologic Neoplasms* / diagnosis
  • Humans
  • Neoplasms*