Automated evaluation of quaternary structures from protein crystals

PLoS Comput Biol. 2018 Apr 30;14(4):e1006104. doi: 10.1371/journal.pcbi.1006104. eCollection 2018 Apr.

Abstract

A correct assessment of the quaternary structure of proteins is a fundamental prerequisite to understanding their function, physico-chemical properties and mode of interaction with other proteins. Currently about 90% of structures in the Protein Data Bank are crystal structures, in which the correct quaternary structure is embedded in the crystal lattice among a number of crystal contacts. Computational methods are required to 1) classify all protein-protein contacts in crystal lattices as biologically relevant or crystal contacts and 2) provide an assessment of how the biologically relevant interfaces combine into a biological assembly. In our previous work we addressed the first problem with our EPPIC (Evolutionary Protein Protein Interface Classifier) method. Here, we present our solution to the second problem with a new method that combines the interface classification results with symmetry and topology considerations. The new algorithm enumerates all possible valid assemblies within the crystal using a graph representation of the lattice and predicts the most probable biological unit based on the pairwise interface scoring. Our method achieves 85% precision (ranging from 76% to 90% for different oligomeric types) on a new dataset of 1,481 biological assemblies with consensus of PDB annotations. Although almost the same precision is achieved by PISA, currently the most popular quaternary structure assignment method, we show that, due to the fundamentally different approach to the problem, the two methods are complementary and could be combined to improve biological assembly assignments. The software for the automatic assessment of protein assemblies (EPPIC version 3) has been made available through a web server at http://www.eppic-web.org.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Crystallography, X-Ray / statistics & numerical data
  • Databases, Protein / statistics & numerical data
  • Models, Molecular
  • Protein Interaction Domains and Motifs
  • Protein Structure, Quaternary*
  • Proteins / chemistry*
  • Software

Substances

  • Proteins

Grants and funding

Financial support to GC from the Swiss National Science Foundation (grant 31003A 140879) and the Research Committee of the Paul Scherrer Institute (grants FK-05.08.1, FK-04.09.1) is gratefully acknowledged, as is IT support from SyBIT/SIS (ETH Zurich). This research was supported in part by the Intramural Research Program of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (support to SB). Support for JD within the RCSB PDB comes from the National Science Foundation, the National Institutes of Health, and the Department of Energy (NSF DBI-1338415; Principal Investigator: Stephen K. Burley). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.