On the complementarity of the consensus-based disorder prediction

Pac Symp Biocomput. 2012:176-87.

Abstract

Intrinsic disorder in proteins plays important roles in transcriptional regulation, translation, and cellular signal transduction. The experimental annotation of the disorder lags behind the rapidly accumulating number of known protein chains, which motivates the development of computational predictors of disorder. Some of these methods address predictions of certain types/flavors of the disorder and recent years show that consensus-based predictors provide a viable way to improve predictive performance. However, the selection of the base predictors in a given consensus is usually performed in an ad-hock manner, based on their availability and with a premise that more is better. We perform first-of-its-kind investigation that analyzes complementarity among a dozen recent predictors to identify characteristics of (future) predictors that would lead to further consensus-based improvements in the predictive quality. The complementarity of a given set of three base predictors is expressed by the differences in their predictions when compared with each other and with their majority vote consensus. We propose a regression-based model that quantifies/predicts quality of the majority-vote consensus of a given triplet of predictors based on their individual predictive performance and their complementarity measured at the residue and the disorder segment levels. Our model shows that improved performance is associated with higher (lower) similarity between the three base predictors at the residue (segment) level and to their consensus prediction at the segment (residue) level. We also show that better consensuses utilize higher quality base methods. We use our model to predict the best-performing consensus on an independent test dataset and our empirical evaluation shows that this consensus outperforms individual methods and other consensus-based predictors based on the area under the ROC curve measure. Our study provides insights that could lead to the development of a new generation of the consensus-based disorder predictors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Computational Biology
  • Consensus Sequence
  • Databases, Protein
  • Linear Models
  • Models, Molecular
  • Protein Conformation
  • Protein Stability
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins