Multi-Omic Graph Transformers for Cancer Classification and Interpretation

Pac Symp Biocomput. 2022:27:373-384.

Abstract

Next-generation sequencing has provided rapid collection and quantification of 'big' biological data. In particular, multi-omics and integration of different molecular data such as miRNA and mRNA can provide important insights to disease classification and processes. There is a need for computational methods that can correctly model and interpret these relationships, and handle the difficulties of large-scale data. In this study, we develop a novel method of representing miRNA-mRNA interactions to classify cancer. Specifically, graphs are designed to account for the interactions and biological communication between miRNAs and mRNAs, using message-passing and attention mechanisms. Patient-matched miRNA and mRNA expression data is obtained from The Cancer Genome Atlas for 12 cancers, and targeting information is incorporated from TargetScan. A Graph Transformer Network (GTN) is selected to provide high interpretability of classification through self-attention mechanisms. The GTN is able to classify the 12 different cancers with an accuracy of 93.56% and is compared to a Graph Convolutional Network, Random Forest, Support Vector Machine, and Multilayer Perceptron. While the GTN does not outperform all of the other classifiers in terms of accuracy, it allows high interpretation of results. Multi-omics models are compared and generally outperform their respective single-omics performance. Extensive analysis of attention identifies important targeting pathways and molecular biomarkers based on integrated miRNA and mRNA expression.

MeSH terms

  • Computational Biology
  • High-Throughput Nucleotide Sequencing
  • Humans
  • MicroRNAs* / genetics
  • Neoplasms* / genetics
  • RNA, Messenger / genetics

Substances

  • MicroRNAs
  • RNA, Messenger