Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction

Brief Bioinform. 2022 Jan 17;23(1):bbab407. doi: 10.1093/bib/bbab407.

Abstract

Motivation: Discovering long noncoding RNA (lncRNA)-disease associations is a fundamental and critical part in understanding disease etiology and pathogenesis. However, only a few lncRNA-disease associations have been identified because of the time-consuming and expensive biological experiments. As a result, an efficient computational method is of great importance and urgently needed for identifying potential lncRNA-disease associations. With the ability of exploiting node features and relationships in network, graph-based learning models have been commonly utilized by these biomolecular association predictions. However, the capability of these methods in comprehensively fusing node features, heterogeneous topological structures and semantic information is distant from optimal or even satisfactory. Moreover, there are still limitations in modeling complex associations between lncRNAs and diseases.

Results: In this paper, we develop a novel heterogeneous graph attention network framework based on meta-paths for predicting lncRNA-disease associations, denoted as HGATLDA. At first, we conduct a heterogeneous network by incorporating lncRNA and disease feature structural graphs, and lncRNA-disease topological structural graph. Then, for the heterogeneous graph, we conduct multiple metapath-based subgraphs and then utilize graph attention network to learn node embeddings from neighbors of these homogeneous and heterogeneous subgraphs. Next, we implement attention mechanism to adaptively assign weights to multiple metapath-based subgraphs and get more semantic information. In addition, we combine neural inductive matrix completion to reconstruct lncRNA-disease associations, which is applied for capturing complicated associations between lncRNAs and diseases. Moreover, we incorporate cost-sensitive neural network into the loss function to tackle the commonly imbalance problem in lncRNA-disease association prediction. Finally, extensive experimental results demonstrate the effectiveness of our proposed framework.

Keywords: cost-sensitive neural network; disease; heterogeneous graph attention network; lncRNA; neural inductive matrix completion.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Neural Networks, Computer
  • RNA, Long Noncoding* / genetics

Substances

  • RNA, Long Noncoding