Protein structure prediction using the evolutionary algorithm USPEX

Proteins. 2023 Jul;91(7):933-943. doi: 10.1002/prot.26478. Epub 2023 Mar 2.

Abstract

Protein structure prediction is one of major problems of modern biophysics: current attempts to predict the tertiary protein structure from amino acid sequence are successful mostly when the use of big data and machine learning allows one to reduce the "prediction problem" to the "problem of recognition". Compared with recent successes of deep learning, classical predictive methods lag behind in their accuracy for the prediction of stable conformations. Therefore, in this work we extended the evolutionary algorithm USPEX to predict protein structure based on global optimization starting with the amino acid sequence. Moreover, we compared frequently used force fields for the task of protein structure prediction. Protein structure relaxation and energy calculations were performed using Tinker (with several different force fields) and Rosetta (with REF2015 force field) codes. To create new protein structure models in the USPEX algorithm, we developed novel variation operators. The test of the new method on seven proteins having (for simplicity) no cis-proline (with ω ≈ 0°) residues, and a length of up to 100 residues, revealed that our algorithm predicts tertiary structures of proteins with high accuracy. The comparison of the final potential energies of the predicted protein structures obtained using the USPEX and the Rosetta Abinitio approach showed that in most cases the developed algorithm found structures with close or even lower energy (Amber/Charmm/Oplsaal) and scoring function (REF2015). While USPEX has clearly demonstrated its ability to find very deep energy minima, our study showed that the existing force fields are not sufficiently accurate for accurate blind prediction of protein structures without further experimental verification.

Keywords: USPEX; evolutionary algorithm; protein folding; protein structure prediction; variation operator.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Protein Conformation
  • Protein Structure, Tertiary
  • Proteins* / chemistry

Substances

  • Proteins