Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information

Jerry C Dinan; James W McCormick; Kimberly A Reynolds

doi:10.1101/cshperspect.a041463

Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information

Cold Spring Harb Perspect Biol. 2024 Apr 1;16(4):a041463. doi: 10.1101/cshperspect.a041463.

Authors

Jerry C Dinan^{1

2

3}, James W McCormick^{1

2

3}, Kimberly A Reynolds^{4

2

3}

Affiliations

¹ The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
² The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
³ The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
⁴ The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA kimberly.reynolds@utsouthwestern.edu.

PMID: 38110247
PMCID: PMC10982702 (available on 2026-04-01)
DOI: 10.1101/cshperspect.a041463

Abstract

Homologous protein sequences are wonderfully diverse, indicating many possible evolutionary "solutions" to the encoding of function. Consequently, one can construct statistical models of protein sequence by analyzing amino acid frequency across a large multiple sequence alignment. A central premise is that covariance between amino acid positions reflects coevolution due to a shared functional or biophysical constraint. In this review, we describe the implementation and discuss the advantages, limitations, and recent progress on two coevolution-based modeling approaches: (1) Potts models of protein sequence (direct coupling analysis [DCA]-like), and (2) the statistical coupling analysis (SCA). Each approach detects interesting features of protein sequence and structure-the former emphasizes local physical contacts throughout the structure, while the latter identifies larger evolutionarily coupled networks of residues. Recent advances in large-scale gene synthesis and high-throughput functional selection now motivate additional work to benchmark model performance across quantitative function prediction and de novo design tasks.

Publication types

Review
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Amino Acids* / genetics
Biological Evolution
Evolution, Molecular
Models, Statistical
Proteins* / metabolism

Substances

Proteins
Amino Acids

Grants and funding

T32 GM131963/GM/NIGMS NIH HHS/United States