Machine Learning Approaches for Quality Assessment of Protein Structures

Jiarui Chen; Shirley W I Siu

doi:10.3390/biom10040626

Machine Learning Approaches for Quality Assessment of Protein Structures

Biomolecules. 2020 Apr 17;10(4):626. doi: 10.3390/biom10040626.

Authors

Jiarui Chen¹, Shirley W I Siu¹

Affiliation

¹ Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China.

Abstract

Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.

Keywords: CASP; DL; EMA; ML; MQA; deep learning; estimating model quality; machine learning; model quality assessment; protein structure prediction.

Publication types

Review

MeSH terms

Amino Acid Sequence
Bayes Theorem
Humans
Machine Learning*
Models, Molecular
Neural Networks, Computer
Proteins / chemistry*
Support Vector Machine

Substances

Proteins

Grants and funding

MYRG2017-00146-FST/Universidade de Macau/International