NSECDA: Natural Semantic Enhancement for CircRNA-Disease Association Prediction

IEEE J Biomed Health Inform. 2022 Oct;26(10):5075-5084. doi: 10.1109/JBHI.2022.3199462. Epub 2022 Oct 4.

Abstract

Increasing evidence suggest that circRNA, as one of the most promising emerging biomarkers, has a very close relationship with diseases. Exploring the relationship between circRNA and diseases can provide novel perspective for diseases diagnosis and pathogenesis. The existing circRNA-disease association (CDA) prediction models, however, generally treat the data attributes equally, do not pay special attention to the attributes with more significant influence, and do not make full use of the correlation and symbiosis between attributes to dig into the latent semantic information of the data. Therefore, in response to the above problems, this paper proposes a natural semantic enhancement method NSECDA to predict CDA. In practical terms, we first recognize the circRNA sequence as a biological language, and analyze its natural semantic properties through the natural language understanding theory; then integrate it with disease attributes, circRNA and disease Gaussian Interaction Profile (GIP) kernel attributes, and use Graph Attention Network (GAT) to focus on the influential attributes, so as to mine the deeply hidden features; finally, the Rotation Forest (RoF) classifier was used to accurately determine CDA. In the gold standard data set CircR2Disease, NSECDA achieved 92.49% accuracy with 0.9225 AUC score. In comparison with the non-natural semantic enhancement model and other classifier models, NSECDA also shows competitive performance. Additionally, 25 of the CDA pairs with unknown associations in the top 30 prediction scores of NSECDA have been proven by newly reported studies. These achievements suggest that NSECDA is an effective model to predict CDA, which can provide credible candidate for subsequent wet experiments, thus significantly reducing the scope of investigations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Humans
  • RNA, Circular* / genetics
  • Semantics*

Substances

  • RNA, Circular