Machine Learning Approaches for Quality Assessment of Protein Structures

Biomolecules. 2020 Apr 17;10(4):626. doi: 10.3390/biom10040626.

Abstract

Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.

Keywords: CASP; DL; EMA; ML; MQA; deep learning; estimating model quality; machine learning; model quality assessment; protein structure prediction.

Publication types

  • Review

MeSH terms

  • Amino Acid Sequence
  • Bayes Theorem
  • Humans
  • Machine Learning*
  • Models, Molecular
  • Neural Networks, Computer
  • Proteins / chemistry*
  • Support Vector Machine

Substances

  • Proteins