Nucleotide-level prediction of CircRNA-protein binding based on fully convolutional neural network

Front Genet. 2023 Oct 6:14:1283404. doi: 10.3389/fgene.2023.1283404. eCollection 2023.

Abstract

Introduction: CircRNA-protein binding plays a critical role in complex biological activity and disease. Various deep learning-based algorithms have been proposed to identify CircRNA-protein binding sites. These methods predict whether the CircRNA sequence includes protein binding sites from the sequence level, and primarily concentrate on analysing the sequence specificity of CircRNA-protein binding. For model performance, these methods are unsatisfactory in accurately predicting motif sites that have special functions in gene expression. Methods: In this study, based on the deep learning models that implement pixel-level binary classification prediction in computer vision, we viewed the CircRNA-protein binding sites prediction as a nucleotide-level binary classification task, and use a fully convolutional neural networks to identify CircRNA-protein binding motif sites (CPBFCN). Results: CPBFCN provides a new path to predict CircRNA motifs. Based on the MEME tool, the existing CircRNA-related and protein-related database, we analysed the motif functions discovered by CPBFCN. We also investigated the correlation between CircRNA sponge and motif distribution. Furthermore, by comparing the motif distribution with different input sequence lengths, we found that some motifs in the flanking sequences of CircRNA-protein binding region may contribute to CircRNA-protein binding. Conclusion: This study contributes to identify circRNA-protein binding and provides help in understanding the role of circRNA-protein binding in gene expression regulation.

Keywords: CircRNA-protein binding sites prediction; deep learning; fully convolutional neural networks; hard negative mining loss; nucleotide-level prediction.

Grants and funding

The authors declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the National Natural Science Foundation of China (Grant nos. 62102200, 62002189, and 62002266) and supported by the Key Research Program in Higher Education of Henan (Grant Number 22A520036) and supported by Science and Technology Research Project of Henan Province (No. 232102211058) and partly supported by the Natural Science Foundation of Shandong Province, China (No. ZR2020QF038) and partly supported by the Technology Small and Medium Enterprises Innovation Capability Improvement Project of Shandong Province (No. 2023TSGC0279) and partly supported by Qilu University of Technology (Shandong Academy of Sciences) Talent Scientific Research Project (No. 2023RCKY128).