CRBSP:Prediction of CircRNA-RBP Binding Sites Based on Multimodal Intermediate Fusion

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2898-2906. doi: 10.1109/TCBB.2023.3272400. Epub 2023 Oct 9.

Abstract

Circular RNA (CircRNA) is widely expressed and has physiological and pathological significance, regulating post-transcriptional processes via its protein-binding activity. However, whereas much work has been done on linear RNA and RNA binding protein (RBP), little is known about the binding sites of CircRNA. The current report is on the development of a medium-term multimodal data fusion strategy, CRBSP, to predict CircRNA-RBP binding sites. CRBSP represents the CircRNA trinucleotide semantic, location, composition and frequency information as the corresponding coding methods of Word to vector (Word2vec), Position-specific trinucleotide propensity (PSTNP), Pseudo trinucleotide composition (PseTNC) and Trinucleotide nucleotide composition (TNC), respectively. CNN (Convolution Neural Networks) was used to extract global information and BiLSTM (bidirectional Long- and Short-Term Memory network) encoder and LSTM (Long- and Short-Term Memory network) decoder for local sequence information. Enhancement of the contributions of key features by the self-attention mechanism was followed by mid-term fusion of the four enhanced features. Logistic Regression (LR) classifier showed that CRBSP gives a mean AUC value of 0.9362 through 5-fold Cross Validation of all 37 datasets, a performance which is superior to five current state-of-the-art models. Similar evaluation of linear RNA-RBP binding sites gave an AUC value of 0.7615 which is also higher than other prediction methods, demonstrating the robustness of CRBSP.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Neural Networks, Computer*
  • Protein Binding
  • RNA / genetics
  • RNA / metabolism
  • RNA, Circular* / genetics
  • RNA, Circular* / metabolism

Substances

  • RNA, Circular
  • RNA