Alignment-based Protein Mutational Landscape Prediction: Doing More with Less

Genome Biol Evol. 2023 Nov 1;15(11):evad201. doi: 10.1093/gbe/evad201.

Abstract

The wealth of genomic data has boosted the development of computational methods predicting the phenotypic outcomes of missense variants. The most accurate ones exploit multiple sequence alignments, which can be costly to generate. Recent efforts for democratizing protein structure prediction have overcome this bottleneck by leveraging the fast homology search of MMseqs2. Here, we show the usefulness of this strategy for mutational outcome prediction through a large-scale assessment of 1.5M missense variants across 72 protein families. Our study demonstrates the feasibility of producing alignment-based mutational landscape predictions that are both high-quality and compute-efficient for entire proteomes. We provide the community with the whole human proteome mutational landscape and simplified access to our predictive pipeline.

Keywords: deep mutational scan; evolution; genotype–phenotype relationship; multiple sequence alignment; protein mutation.

MeSH terms

  • Computational Biology* / methods
  • Genomics
  • Humans
  • Mutation, Missense
  • Proteins* / chemistry
  • Sequence Alignment

Substances

  • Proteins