Association prediction of CircRNAs and diseases using multi-homogeneous graphs and variational graph auto-encoder

Comput Biol Med. 2022 Dec;151(Pt A):106289. doi: 10.1016/j.compbiomed.2022.106289. Epub 2022 Nov 11.

Abstract

As a non-coding RNA molecule with closed-loop structure, circular RNA (circRNA) is tissue-specific and cell-specific in expression pattern. It regulates disease development by modulating the expression of disease-related genes. Therefore, exploring the circRNA-disease relationship can reveal the molecular mechanism of disease pathogenesis. Biological experiments for detecting circRNA-disease associations are time-consuming and laborious. Constrained by the sparsity of known circRNA-disease associations, existing algorithms cannot obtain relatively complete structural information to represent features accurately. To this end, this paper proposes a new predictor, VGAERF, combining Variational Graph Auto-Encoder (VGAE) and Random Forest (RF). Firstly, circRNA homogeneous graph structure and disease homogeneous graph structure are constructed by Gaussian interaction profile (GIP) kernel similarity, semantic similarity, and known circRNA-disease associations. VGAEs with the same structure are employed to extract the higher-order features by the encoding and decoding of input graph structures. To further increase the completeness of the network structure information, the deep features acquired from the two VGAEs are summed, and then train the RF with sparse data processing capability to perform the prediction task. On the independent test set, the Area Under ROC Curve (AUC), accuracy, and Area Under PR Curve (AUPR) of the proposed method reach up to 0.9803, 0.9345, and 0.9894, respectively. On the same dataset, the AUC, accuracy, and AUPR of VGAERF are 2.09%, 5.93%, and 1.86% higher than the best-performing method (AEDNN). It is anticipated that VGAERF will provide significant information to decipher the molecular mechanisms of circRNA-disease associations, and promote the diagnosis of circRNA-related diseases.

Keywords: CircRNA-disease association; Graph neural network; Random forest; Variational graph auto-encoder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Female
  • Humans
  • Labor, Obstetric*
  • Pregnancy
  • RNA, Circular* / genetics
  • Semantics

Substances

  • RNA, Circular