Forecasting residue-residue contact prediction accuracy

P P Wozniak; B M Konopka; J Xu; G Vriend; M Kotulska

doi:10.1093/bioinformatics/btx416

Forecasting residue-residue contact prediction accuracy

Bioinformatics. 2017 Nov 1;33(21):3405-3414. doi: 10.1093/bioinformatics/btx416.

Authors

P P Wozniak¹, B M Konopka¹, J Xu², G Vriend³, M Kotulska¹

Affiliations

¹ Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland.
² Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
³ Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands.

Abstract

Motivation: Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein.

Results: We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX.

Availability and implementation: All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/.

Contact: malgorzata.kotulska@pwr.edu.pl.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Algorithms
Computational Biology / methods*
Data Accuracy
Models, Molecular
Mutation*
Protein Structure, Secondary*
Proteins / genetics
Sequence Analysis, Protein / methods*
Software*

Substances

Proteins

Grants and funding

R01 GM089753/GM/NIGMS NIH HHS/United States