End-to-End Protein Normal Mode Frequency Predictions Using Language and Graph Models and Application to Sonification

ACS Nano. 2022 Dec 27;16(12):20656-20670. doi: 10.1021/acsnano.2c07681. Epub 2022 Nov 23.

Abstract

The prediction of mechanical and dynamical properties of proteins is an important frontier, especially given the greater availability of proteins structures. Here we report a series of models that provide end-to-end predictions of nanodynamical properties of proteins, focused on high-throughput normal mode predictions directly from the amino acid sequence. Using neural network models within the family of Natural Language Processing and graph-based methods, we offer atomistically based mechanistic predictions of key protein mechanical features. The models include an end-to-end long short-term memory (LSTM) model, an end-to-end transformer model, a graph-based transformer model, and an equivariant graph neural network. All four models show exceptional performance, with the graph-based transformer architecture offering the best results but at the cost of requiring a graph structure as input. Conversely, the LSTM and transformer models offer end-to-end sequence-to-property prediction capabilities, providing efficient avenues for protein engineering, analysis, and design. We compare our results against published data based on a Principal Neighborhood Aggregation graph neural network, revealing that the transformer model offers better performance while also being able to predict a large set of the first 64 normal mode frequencies, simultaneously. The use of the end-to-end transformer model may facilitate other downstream applications through the use of transfer learning, and it offers a comprehensive prediction of dynamical properties without any structural knowledge, directly from the amino acid sequence. We demonstrate a potential application in scientific sonification, where the normal mode frequencies are transposed to generate audible signals for a detailed analysis of subtle changes of protein sequences.

Keywords: Biomaterials; attention models; deep learning; materiomics; mechanics; proteins.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence
  • Learning
  • Neural Networks, Computer*
  • Proteins* / chemistry

Substances

  • Proteins