Evolution is not Uniform Along Coding Sequences

Mol Biol Evol. 2023 Mar 4;40(3):msad042. doi: 10.1093/molbev/msad042.

Abstract

Amino acids evolve at different speeds within protein sequences, because their functional and structural roles are different. Notably, amino acids located at the surface of proteins are known to evolve more rapidly than those in the core. In particular, amino acids at the N- and C-termini of protein sequences are likely to be more exposed than those at the core of the folded protein due to their location in the peptidic chain, and they are known to be less structured. Because of these reasons, we would expect that amino acids located at protein termini would evolve faster than residues located inside the chain. Here we test this hypothesis and found that amino acids evolve almost twice as fast at protein termini compared with those in the center, hinting at a strong topological bias along the sequence length. We further show that the distribution of solvent-accessible residues and functional domains in proteins readily explain how structural and functional constraints are weaker at their termini, leading to the observed excess of amino acid substitutions. Finally, we show that the specific evolutionary rates at protein termini may have direct consequences, notably misleading in silico methods used to infer sites under positive selection within genes. These results suggest that accounting for positional information should improve evolutionary models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Amino Acids* / chemistry
  • Amino Acids* / genetics
  • Evolution, Molecular
  • Exons
  • Proteins* / chemistry
  • Proteins* / genetics

Substances

  • Proteins
  • Amino Acids