Prediction of Thermostability of Enzymes Based on the Amino Acid Index (AAindex) Database and Machine Learning

Molecules. 2023 Dec 15;28(24):8097. doi: 10.3390/molecules28248097.

Abstract

The combination of wet-lab experimental data on multi-site combinatorial mutations and machine learning is an innovative method in protein engineering. In this study, we used an innovative sequence-activity relationship (innov'SAR) methodology based on novel descriptors and digital signal processing (DSP) to construct a predictive model. In this paper, 21 experimental (R)-selective amine transaminases from Aspergillus terreus (AT-ATA) were used as an input to predict higher thermostability mutants than those predicted using the existing data. We successfully improved the coefficient of determination (R2) of the model from 0.66 to 0.92. In addition, root-mean-squared deviation (RMSD), root-mean-squared fluctuation (RMSF), solvent accessible surface area (SASA), hydrogen bonds, and the radius of gyration were estimated based on molecular dynamics simulations, and the differences between the predicted mutants and the wild-type (WT) were analyzed. The successful application of the innov'SAR algorithm in improving the thermostability of AT-ATA may help in directed evolutionary screening and open up new avenues for protein engineering.

Keywords: artificial intelligence; directed evolution; extended sequence; machine learning; molecular dynamics simulation; thermostability.

MeSH terms

  • Amino Acids* / genetics
  • Enzyme Stability
  • Machine Learning
  • Molecular Dynamics Simulation
  • Protein Engineering*
  • Transaminases / metabolism

Substances

  • Amino Acids
  • Transaminases

Grants and funding

This research was financially supported by the National Natural Science Foundation of China (Grant nos. 20904047, 21673207, 21873087), the Natural Science Foundation of Zhejiang Province (Grant nos. LY17A040001) and the ZUST Postgraduate Research and Innovation Fund (2022yjskc22).