A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine

Sci Rep. 2018 Jun 22;8(1):9552. doi: 10.1038/s41598-018-27814-2.

Abstract

RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Machine Learning*
  • Protein Binding
  • RNA / metabolism*
  • RNA-Binding Proteins / chemistry
  • RNA-Binding Proteins / metabolism*

Substances

  • RNA-Binding Proteins
  • RNA