Differential Retention of Pfam Domains Contributes to Long-term Evolutionary Trends

Mol Biol Evol. 2023 Apr 4;40(4):msad073. doi: 10.1093/molbev/msad073.

Abstract

Protein domains that emerged more recently in evolution have a higher structural disorder and greater clustering of hydrophobic residues along the primary sequence. It is hard to explain how selection acting via descent with modification could act so slowly as not to saturate over the extraordinarily long timescales over which these trends persist. Here, we hypothesize that the trends were created by a higher level of selection that differentially affects the retention probabilities of protein domains with different properties. This hypothesis predicts that loss rates should depend on disorder and clustering trait values. To test this, we inferred loss rates via maximum likelihood for animal Pfam domains, after first performing a set of stringent quality control methods to reduce annotation errors. Intermediate trait values, matching those of ancient domains, are associated with the lowest loss rates, making our results difficult to explain with reference to previously described homology detection biases. Simulations confirm that effect sizes are of the right magnitude to produce the observed long-term trends. Our results support the hypothesis that differential domain loss slowly weeds out those protein domains that have nonoptimal levels of disorder and clustering. The same preferences also shape the differential diversification of Pfam domains, thereby further impacting proteome composition.

Keywords: Cope's rule; clade selection; gene families; intrinsic structural disorder; phylostratigraphy; protein evolution; protein folding.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Databases, Protein
  • Hydrophobic and Hydrophilic Interactions
  • Probability
  • Protein Domains
  • Proteome*

Substances

  • Proteome