Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):78. doi: 10.1186/s12859-020-3345-9.

Abstract

Background: Finding the tumor location in the prostate is an essential pathological step for prostate cancer diagnosis and treatment. The location of the tumor - the laterality - can be unilateral (the tumor is affecting one side of the prostate), or bilateral on both sides. Nevertheless, the tumor can be overestimated or underestimated by standard screening methods. In this work, a combination of efficient machine learning methods for feature selection and classification are proposed to analyze gene activity and select them as relevant biomarkers for different laterality samples.

Results: A data set that consists of 450 samples was used in this study. The samples were divided into three laterality classes (left, right, bilateral). The aim of this work is to understand the genomic activity in each class and find relevant genes as indicators for each class with nearly 99% accuracy. The system identified groups of differentially expressed genes (RTN1, HLA-DMB, MRI1) that are able to differentiate samples among the three classes.

Conclusion: The proposed method was able to detect sets of genes that can identify different laterality classes. The resulting genes are found to be strongly correlated with disease progression. HLA-DMB and EIF4G2, which are detected in the set of genes can detect the left laterality, were reported earlier to be in the same pathway called Allograft rejection SuperPath.

Keywords: Biomarkers; Classification; Machine learning; Prostate cancer laterality.

MeSH terms

  • Area Under Curve
  • Autoantigens / genetics
  • Autoantigens / metabolism
  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Machine Learning*
  • Magnetic Resonance Imaging
  • Male
  • Phosphoproteins / genetics
  • Phosphoproteins / metabolism
  • Prostate / diagnostic imaging
  • Prostatic Neoplasms / diagnostic imaging
  • Prostatic Neoplasms / genetics
  • Prostatic Neoplasms / pathology*
  • ROC Curve
  • Ribonuclease P / genetics
  • Ribonuclease P / metabolism
  • Serine-Arginine Splicing Factors / genetics
  • Serine-Arginine Splicing Factors / metabolism

Substances

  • Autoantigens
  • Biomarkers, Tumor
  • POP7 protein, human
  • Phosphoproteins
  • SRSF6 protein, human
  • Serine-Arginine Splicing Factors
  • Ribonuclease P