Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network

Anal Biochem. 2019 Oct 15:583:113364. doi: 10.1016/j.ab.2019.113364. Epub 2019 Jul 16.

Abstract

Long non-coding RNA (lncRNA) plays an important role in cells through the interaction with RNA-binding proteins (RBPs). Finding the RBPs binding sites on the lncRNA chains can help to understand the post-transcriptional regulatory mechanism, exploring the pathogenesis of cancers and possible roles in other diseases. Although many genome-wide RBP experimental techniques can identify the RNA-protein interactions and detect the binding sites on RNA chains, they are still time-consuming, labor-intensive and cost-heavy. Thus, many computational methods have been developed to predict the RBPs sites by integrating the RNA sequence, structure and domain specific features, etc. However, current approaches that focus on predicting the RBPs binding sites on RNA chains lack a consideration of the dependencies among nucleotides. In this work, we propose a higher-order nucleotide encoding convolutional neural network-based method (namely HOCNNLB) to predict the RBPs binding sites on lncRNA chains. HOCNNLB first employs a high-order one-hot encoding strategy to encode the lncRNA sequences by considering the dependence among nucleotides, then the encoded lncRNA sequences are fed into the convolutional neural network (CNN) to predict the RBP binding sites. We evaluate HOCNNLB on 31 experimental datasets of 12 lncRNA binding proteins. The average AUC of HOCNNLB achieves 0.953, which is 0.247, 0.175 higher than that of iDeepS and DeepBind, respectively. The average accuracy is 90.2%, which is 26.8%, 19.5% higher than that of iDeepS and DeepBind, respectively. These results demonstrate that HOCNNLB can reliably predict the RBP binding sites on lncRNA chains and outperforms the state-of-the-art methods. The source code of HOCNNLB and the datasets used in this work are available at https://github.com/NWPU-903PR/HOCNNLB for academic users.

Keywords: Binding site; Convolutional neural network; Higher-order one-hot encoding; lncRNA-binding protein.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • Computational Biology
  • Databases, Genetic
  • Datasets as Topic
  • Neural Networks, Computer
  • Protein Binding
  • RNA, Long Noncoding / metabolism*
  • RNA-Binding Proteins / metabolism*
  • Software

Substances

  • RNA, Long Noncoding
  • RNA-Binding Proteins