Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method

J Theor Biol. 2019 Jan 14:461:230-238. doi: 10.1016/j.jtbi.2018.10.029. Epub 2018 Oct 12.

Abstract

RNA-protein interaction (RPI) plays an important role in the basic cellular processes of organisms. Unfortunately, due to time and cost constraints, it is difficult for biological experiments to determine the relationship between RNA and protein to a large extent. So there is an urgent need for reliable computational methods to quickly and accurately predict RNA-protein interaction. In this study, we propose a novel computational method RPIFSE (predicting RPI with Feature Selection Ensemble method) based on RNA and protein sequence information to predict RPI. Firstly, RPIFSE disturbs the features extracted by the convolution neural network (CNN) and generates multiple data sets according to the weight of the feature, and then use extreme learning machine (ELM) classifier to classify these data sets. Finally, the results of each classifier are combined, and the highest score is chosen as the final prediction result by weighting voting method. In 5-fold cross-validation experiments, RPIFSE achieved 91.87%, 89.74%, 97.76% and 98.98% accuracy on RPI369, RPI2241, RPI488 and RPI1807 data sets, respectively. To further evaluate the performance of RPIFSE, we compare it with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on those data sets. Furthermore, we also predicted the RPI on the independent data set NPInter2.0 and drew the network graph based on the prediction results. These promising comparison results demonstrated the effectiveness of RPIFSE and indicated that RPIFSE could be a useful tool for predicting RPI.

Keywords: Convolution neural network; Extreme learning machine; Position-specific scoring matrix; RNA-protein interaction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Datasets as Topic
  • Neural Networks, Computer*
  • Protein Binding
  • RNA / metabolism*
  • Sequence Analysis
  • Support Vector Machine

Substances

  • RNA