LPI-CNNCP: Prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick

Anal Biochem. 2020 Jul 15:601:113767. doi: 10.1016/j.ab.2020.113767. Epub 2020 May 23.

Abstract

Long noncoding RNAs (lncRNAs) play critical roles in many pathological and biological processes, such as post-transcription, cell differentiation and gene regulation. Increasingly more studies have shown that lncRNAs function through mainly interactions with specific RNA binding proteins (RBPs). However, experimental identification of potential lncRNA-protein interactions is costly and time-consuming. In this work, we propose a novel convolutional neural network-based method with the copy-padding trick (named LPI-CNNCP) to predict lncRNA-protein interactions. The copy-padding trick of the LPI-CNNCP convert the protein/RNA sequences with variable-length into the fixed-length sequences, thus enabling the construction of the CNN model. A high-order one-hot encoding is also applied to transform the protein/RNA sequences into image-like inputs for capturing the dependencies among amino acids (or nucleotides). In the end, these encoded protein/RNA sequences are feed into a CNN to predict the lncRNA-protein interactions. Compared with other state-of-the-art methods in 10-fold cross-validation (10CV) test, LPI-CNNCP shows the best performance. Results in the independent test demonstrate that our LPI-CNNCP can effectively predict the potential lncRNA-protein interactions. We also compared the copy-padding trick with two other existing tricks (i.e., zero-padding and cropping), and the results show that our copy-padding rick outperforms the zero-padding and cropping tricks on predicting lncRNA-protein interactions. The source code of LPI-CNNCP and the datasets used in this work are available at https://github.com/NWPU-903PR/LPI-CNNCP for academic users.

Keywords: Convolutional neural network; Copy-padding trick; High-order one-hot encoding; lncRNA-protein interactions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Humans
  • Neural Networks, Computer*
  • RNA, Long Noncoding / chemistry*
  • RNA-Binding Proteins / chemistry*

Substances

  • RNA, Long Noncoding
  • RNA-Binding Proteins