Prediction of biomarker-disease associations based on graph attention network and text representation

Brief Bioinform. 2022 Sep 20;23(5):bbac298. doi: 10.1093/bib/bbac298.

Abstract

Motivation: The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers.

Results: Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods.

Availability: The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.

Keywords: bimodal fusion network; graph attention network; lncRNA–disease associations; miRNA–disease associations; microbe–disease associations; text-based relation representation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology* / methods
  • Humans
  • Software*