Hierarchy and extremes in selections from pools of randomized proteins

Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3482-7. doi: 10.1073/pnas.1517813113. Epub 2016 Mar 11.

Abstract

Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different "frameworks" typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution).

Keywords: antibodies; biological diversity; directed evolution; extreme values; phage display.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Cell Surface Display Techniques
  • Directed Molecular Evolution / methods*
  • Directed Molecular Evolution / statistics & numerical data
  • Escherichia coli / genetics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Statistical
  • Molecular Sequence Data
  • Peptide Library*
  • Proteins / chemistry*
  • Proteins / genetics*
  • Reproducibility of Results
  • Sequence Alignment

Substances

  • Peptide Library
  • Proteins