Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality

J Biomed Inform. 2019 Dec:100:103313. doi: 10.1016/j.jbi.2019.103313. Epub 2019 Oct 23.

Abstract

Ovarian cancer (OC) is a common cause of cancer death among women worldwide, so there is a pressing need to identify factors influencing OC mortality. Much OC patient clinical data is publicly accessible via the Broad Institute Cancer Genome Atlas (TCGA) datasets which include patient age, cancer site, stage and subtype and patient survival, as well as OC gene transcription profiles. These allow studies correlating OC patient survival (and other clinical variables) with gene expression to identify new OC biomarkers to predict patient mortality. We integrated clinical and tissue transcriptome data from patients available from the TCGA portal. We determined OC mRNA expression levels (compared to normal ovarian tissue) of 41 genes already implicated in OC progression, and assessed how their OC tissue expression levels predicts patient survival. We employed Cox Proportional Hazard regression models to analyse clinical factors and transcriptomic information to determine the relative effects on survival that is associated with each factor. Multivariate analysis of combined data (clinical and gene mRNA expression) found age and ovary tumour site significantly correlated with patient survival. The univariate analysis also confirmed significant differences in patient survival time when altered transcription levels of TLR4, BSCL2, CDH1, ERBB2, and SCGB2A1 were evident, while multivariate analysis that considered the 41 genes simultaneously revealed a significant relationship of survival with TLR4, BSCL2, CDH1, ERBB2 and PTPRE genes. However, analyses that considered all 41 genes with clinical variables together identified genes TLR4, BSCL2, CDH1, ERBB2, BRCA2 and SCGB2A1 as independently related to survival in OC. These studies indicate that the latter genes influence OC patient survival, i.e., expression levels of these genes provide mechanistic and predictive information in addition to that of the clinical traits. Our study provides strong evidence that these genes are important prognostic indicators of patient survival that give clues to biological processes that underlie OC progression and mortality.

Keywords: Clinical factors; Gene expression; Molecular pathways; Ovarian cancer; RNA seq; Survival analysis.

MeSH terms

  • Computational Biology*
  • Computer Simulation*
  • Datasets as Topic
  • Disease Progression
  • Female
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Machine Learning*
  • Ovarian Neoplasms / genetics*
  • Ovarian Neoplasms / mortality*
  • Ovarian Neoplasms / pathology
  • Survival Analysis