Machine learning approaches demonstrate that protein structures carry information about their genetic coding

Sci Rep. 2022 Dec 20;12(1):21968. doi: 10.1038/s41598-022-25874-z.

Abstract

Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered inconsequential to the final protein structure, there is mounting evidence for an association between the two. Our study examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon-codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids* / genetics
  • Codon / genetics
  • Proteins* / chemistry
  • Proteins* / genetics

Substances

  • Proteins
  • Codon
  • Amino Acids