Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations

Methods Mol Biol. 2019:1851:83-103. doi: 10.1007/978-1-4939-8736-8_5.

Abstract

The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.

Keywords: Coevolution; Energy landscapes; Molecular dynamics; Protein Folding; Structure prediction; Structure-based model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Molecular Dynamics Simulation
  • Protein Conformation
  • Protein Folding
  • Proteins / chemistry*
  • Proteins / metabolism*

Substances

  • Proteins