Prediction of tumor purity from gene expression data using machine learning

Bonil Koo; Je-Keun Rhee

doi:10.1093/bib/bbab163

Prediction of tumor purity from gene expression data using machine learning

Brief Bioinform. 2021 Nov 5;22(6):bbab163. doi: 10.1093/bib/bbab163.

Authors

Bonil Koo^{1

2}, Je-Keun Rhee¹

Affiliations

¹ School of Systems Biomedical Science, Soongsil University, Seoul, Korea.
² Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.

PMID: 33954576
DOI: 10.1093/bib/bbab163

Abstract

Motivation: Bulk tumor samples used for high-throughput molecular profiling are often an admixture of cancer cells and non-cancerous cells, which include immune and stromal cells. The mixed composition can confound the analysis and affect the biological interpretation of the results, and thus, accurate prediction of tumor purity is critical. Although several methods have been proposed to predict tumor purity using high-throughput molecular data, there has been no comprehensive study on machine learning-based methods for the estimation of tumor purity.

Results: We applied various machine learning models to estimate tumor purity. Overall, the models predicted the tumor purity accurately and showed a high correlation with well-established gold standard methods. In addition, we identified a small group of genes and demonstrated that they could predict tumor purity well. Finally, we confirmed that these genes were mainly involved in the immune system.

Availability: The machine learning models constructed for this study are available at https://github.com/BonilKoo/ML_purity.

Keywords: cancer genomics; machine learning; regression; tumor purity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artifacts
Biomarkers, Tumor
DNA Contamination*
DNA, Neoplasm*
Gene Expression Profiling / methods*
Gene Expression Profiling / standards*
Humans
Machine Learning*
Neoplasms / diagnosis
Neoplasms / genetics*
Reproducibility of Results
Transcriptome*

Substances

Biomarkers, Tumor
DNA, Neoplasm