New amino acid substitution matrix brings sequence alignments into agreement with structure matches

Proteins. 2021 Jun;89(6):671-682. doi: 10.1002/prot.26050. Epub 2021 Feb 2.

Abstract

Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for "twilight zone" protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.

Keywords: amino acid similarities; amino acid substitution matrix; interdependent amino acid substitutions; protein sequence matching; protein structure matching.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acid Substitution*
  • Amino Acids / chemistry*
  • Amino Acids / metabolism
  • Databases, Protein
  • Datasets as Topic
  • Humans
  • Models, Molecular
  • Protein Engineering / methods
  • Proteins / chemistry*
  • Proteins / metabolism
  • Sequence Alignment
  • Sequence Homology, Amino Acid

Substances

  • Amino Acids
  • Proteins