Minimum epistasis interpolation for sequence-function relationships

Nat Commun. 2020 Apr 14;11(1):1782. doi: 10.1038/s41467-020-15512-5.

Abstract

Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Computational Biology / methods
  • Epistasis, Genetic / genetics*
  • Genotype
  • Least-Squares Analysis
  • Models, Theoretical*

Substances

  • Bacterial Proteins
  • IgG Fc-binding protein, Streptococcus