Network-based multi-task learning models for biomarker selection and cancer outcome prediction

Bioinformatics. 2020 Mar 1;36(6):1814-1822. doi: 10.1093/bioinformatics/btz809.

Abstract

Motivation: Detecting cancer gene expression and transcriptome changes with mRNA-sequencing or array-based data are important for understanding the molecular mechanisms underlying carcinogenesis and cellular events during cancer progression. In previous studies, the differentially expressed genes were detected across patients in one cancer type. These studies ignored the role of mRNA expression changes in driving tumorigenic mechanisms that are either universal or specific in different tumor types. To address the problem, we introduce two network-based multi-task learning frameworks, NetML and NetSML, to discover common differentially expressed genes shared across different cancer types as well as differentially expressed genes specific to each cancer type. The proposed frameworks consider the common latent gene co-expression modules and gene-sample biclusters underlying the multiple cancer datasets to learn the knowledge crossing different tumor types.

Results: Large-scale experiments on simulations and real cancer high-throughput datasets validate that the proposed network-based multi-task learning frameworks perform better sample classification compared with the models without the knowledge sharing across different cancer types. The common and cancer-specific molecular signatures detected by multi-task learning frameworks on The Cancer Genome Atlas ovarian, breast and prostate cancer datasets are correlated with the known marker genes and enriched in cancer-relevant Kyoto Encyclopedia of Genes and Genome pathways and gene ontology terms.

Availability and implementation: Source code is available at: https://github.com/compbiolabucf/NetML.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biomarkers
  • Gene Regulatory Networks
  • Genome
  • Humans
  • Software*
  • Transcriptome*

Substances

  • Biomarkers