Predicting RNA-binding residues from evolutionary information and sequence conservation

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2164-11-S4-S2.

Abstract

Background: RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments.

Results: The proposed prediction framework named "ProteRNA" combines a SVM-based classifier with conserved residue discovery by WildSpan to identify the residues that interact with RNA in a RNA-binding protein. Although these conserved residues can be either functionally conserved residues or structurally conserved residues, they provide clues on the important residues in a protein sequence. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.78%, MCC of 0.2628, F-score of 0.3075, and F0.5-score of 0.3546.

Conclusions: This article presents the design of a sequence-based predictor aiming to identify the RNA-binding residues in a RNA-binding protein by combining machine learning and pattern mining approaches. RNA-binding proteins have diverse functions while interacting with different categories of RNAs because these proteins are composed of multiple copies of RNA-binding domains presented in various structural arrangements to expand the functional repertoire of RNA-binding proteins. Furthermore, predicting RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Artificial Intelligence
  • Base Sequence
  • Computational Biology / methods
  • Conserved Sequence / genetics*
  • Databases, Protein
  • Evolution, Molecular*
  • Humans
  • Pattern Recognition, Automated / methods
  • Predictive Value of Tests
  • Protein Binding / genetics
  • RNA / chemistry*
  • RNA / genetics
  • RNA / metabolism*
  • RNA-Binding Proteins / chemistry*
  • RNA-Binding Proteins / genetics
  • RNA-Binding Proteins / metabolism
  • Reproducibility of Results
  • Software

Substances

  • RNA-Binding Proteins
  • RNA