Classification and Functional Analysis between Cancer and Normal Tissues Using Explainable Pathway Deep Learning through RNA-Sequencing Gene Expression

Int J Mol Sci. 2021 Oct 26;22(21):11531. doi: 10.3390/ijms222111531.

Abstract

Deep learning has proven advantageous in solving cancer diagnostic or classification problems. However, it cannot explain the rationale behind human decisions. Biological pathway databases provide well-studied relationships between genes and their pathways. As pathways comprise knowledge frameworks widely used by human researchers, representing gene-to-pathway relationships in deep learning structures may aid in their comprehension. Here, we propose a deep neural network (PathDeep), which implements gene-to-pathway relationships in its structure. We also provide an application framework measuring the contribution of pathways and genes in deep neural networks in a classification problem. We applied PathDeep to classify cancer and normal tissues based on the publicly available, large gene expression dataset. PathDeep showed higher accuracy than fully connected neural networks in distinguishing cancer from normal tissues (accuracy = 0.994) in 32 tissue samples. We identified 42 pathways related to 32 cancer tissues and 57 associated genes contributing highly to the biological functions of cancer. The most significant pathway was G-protein-coupled receptor signaling, and the most enriched function was the G1/S transition of the mitotic cell cycle, suggesting that these biological functions were the most common cancer characteristics in the 32 tissues.

Keywords: biological function; cancer gene expression; deep learning; neural networks; pathway.

MeSH terms

  • Databases, Nucleic Acid / statistics & numerical data
  • Deep Learning*
  • Diagnosis, Computer-Assisted
  • Gene Expression Regulation, Neoplastic
  • Gene Regulatory Networks
  • Humans
  • Neoplasms / classification*
  • Neoplasms / diagnosis
  • Neoplasms / genetics*
  • Neural Networks, Computer
  • RNA-Seq / statistics & numerical data*