Predicting miRNA-disease associations based on graph attention network with multi-source information

BMC Bioinformatics. 2022 Jun 21;23(1):244. doi: 10.1186/s12859-022-04796-7.

Abstract

Background: There is a growing body of evidence from biological experiments suggesting that microRNAs (miRNAs) play a significant regulatory role in both diverse cellular activities and pathological processes. Exploring miRNA-disease associations not only can decipher pathogenic mechanisms but also provide treatment solutions for diseases. As it is inefficient to identify undiscovered relationships between diseases and miRNAs using biotechnology, an explosion of computational methods have been advanced. However, the prediction accuracy of existing models is hampered by the sparsity of known association network and single-category feature, which is hard to model the complicated relationships between diseases and miRNAs.

Results: In this study, we advance a new computational framework (GATMDA) to discover unknown miRNA-disease associations based on graph attention network with multi-source information, which effectively fuses linear and non-linear features. In our method, the linear features of diseases and miRNAs are constructed by disease-lncRNA correlation profiles and miRNA-lncRNA correlation profiles, respectively. Then, the graph attention network is employed to extract the non-linear features of diseases and miRNAs by aggregating information of each neighbor with different weights. Finally, the random forest algorithm is applied to infer the disease-miRNA correlation pairs through fusing linear and non-linear features of diseases and miRNAs. As a result, GATMDA achieves impressive performance: an average AUC of 0.9566 with five-fold cross validation, which is superior to other previous models. In addition, case studies conducted on breast cancer, colon cancer and lymphoma indicate that 50, 50 and 48 out of the top fifty prioritized candidates are verified by biological experiments.

Conclusions: The extensive experimental results justify the accuracy and utility of GATMDA and we could anticipate that it may regard as a utility tool for identifying unobserved disease-miRNA relationships.

Keywords: Feature fusion; Graph attention network; Random forest; miRNA-disease associations.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Genetic Predisposition to Disease
  • Humans
  • MicroRNAs* / genetics
  • RNA, Long Noncoding* / genetics

Substances

  • MicroRNAs
  • RNA, Long Noncoding