Improving accuracy of protein contact prediction using balanced network deconvolution

Proteins. 2015 Mar;83(3):485-96. doi: 10.1002/prot.24744. Epub 2015 Jan 24.

Abstract

Residue contact map is essential for protein three-dimensional structure determination. But most of the current contact prediction methods based on residue co-evolution suffer from high false-positives as introduced by indirect and transitive contacts (i.e., residues A-B and B-C are in contact, but A-C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726-733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co-evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium- and long-range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P-value < 5.93 × 10(-3) in the Student's t-test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co-evolution-based contact prediction to the three-dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/.

Keywords: filter; predictor; protein structure prediction; residue co-evolution; residue contact map; transitive noise.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Protein
  • Models, Molecular
  • Protein Conformation*
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins