Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information

Int J Mol Sci. 2023 Oct 27;24(21):15656. doi: 10.3390/ijms242115656.

Abstract

Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure-protein blocks (PBs)-can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids' physicochemical properties and the resolved structures' statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%).

Keywords: DSSP (Dictionary of Secondary Structure in Proteins); machine learning (ML); protein blocks (PBs); protein local structure; protein local structure prediction; protein secondary structure prediction.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Neural Networks, Computer*
  • Protein Structure, Secondary
  • Proteins* / chemistry

Substances

  • Proteins
  • Amino Acids