Forecasting residue-residue contact prediction accuracy

Bioinformatics. 2017 Nov 1;33(21):3405-3414. doi: 10.1093/bioinformatics/btx416.

Abstract

Motivation: Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein.

Results: We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX.

Availability and implementation: All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/.

Contact: malgorzata.kotulska@pwr.edu.pl.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Data Accuracy
  • Models, Molecular
  • Mutation*
  • Protein Structure, Secondary*
  • Proteins / genetics
  • Sequence Analysis, Protein / methods*
  • Software*

Substances

  • Proteins