Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs

Brief Bioinform. 2022 Sep 20;23(5):bbac361. doi: 10.1093/bib/bbac361.

Abstract

Motivation: Long noncoding RNAs (lncRNAs) play an important role in the occurrence and development of diseases. Predicting disease-related lncRNAs can help to understand the pathogenesis of diseases deeply. The existing methods mainly rely on multi-source data related to lncRNAs and diseases when predicting the associations between lncRNAs and diseases. There are interdependencies among node attributes in a heterogeneous graph composed of all lncRNAs, diseases and micro RNAs. The meta-paths composed of various connections between them also contain rich semantic information. However, the existing methods neglect to integrate attribute information of intermediate nodes in meta-paths.

Results: We propose a novel association prediction model, GSMV, to learn and deeply integrate the global dependencies, semantic information of meta-paths and node-pair multi-view features related to lncRNAs and diseases. We firstly formulate the global representations of the lncRNA and disease nodes by establishing a self-attention mechanism to capture and learn the global dependencies among node attributes. Second, starting from the lncRNA and disease nodes, respectively, multiple meta-pathways are established to reveal different semantic information. Considering that each meta-path contains specific semantics and has multiple meta-path instances which have different contributions to revealing meta-path semantics, we design a graph neural network based module which consists of a meta-path instance encoding strategy and two novel attention mechanisms. The proposed meta-path instance encoding strategy is used to learn the contextual connections between nodes within a meta-path instance. One of the two new attention mechanisms is at the meta-path instance level, which learns rich and informative meta-path instances. The other attention mechanism integrates various semantic information from multiple meta-paths to learn the semantic representation of lncRNA and disease nodes. Finally, a dilated convolution-based learning module with adjustable receptive fields is proposed to learn multi-view features of lncRNA-disease node pairs. The experimental results prove that our method outperforms seven state-of-the-art comparing methods for lncRNA-disease association prediction. Ablation experiments demonstrate the contributions of the proposed global representation learning, semantic information learning, pairwise multi-view feature learning and the meta-path instance encoding strategy. Case studies on three cancers further demonstrate our method's ability to discover potential disease-related lncRNA candidates.

Contact: zhang@hlju.edu.cn or peiliangwu@ysu.edu.cn.

Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Keywords: Deep learning; Global dependency learning; Meta-path instance level attention; Meta-path semantic encoding; Meta-path type level attention; Multi-view feature learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Neural Networks, Computer
  • RNA, Long Noncoding* / genetics
  • Semantics

Substances

  • RNA, Long Noncoding