Machine learning techniques for protein function prediction

Proteins. 2020 Mar;88(3):397-413. doi: 10.1002/prot.25832. Epub 2019 Nov 14.

Abstract

Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.

Keywords: deep learning; feature selection; machine learning; protein function prediction.

Publication types

  • Review

MeSH terms

  • Amino Acids / chemistry*
  • Computational Biology / statistics & numerical data*
  • Humans
  • Logistic Models
  • Machine Learning*
  • Multifactor Dimensionality Reduction
  • Neural Networks, Computer
  • Proteins / chemistry
  • Proteins / physiology*
  • Saccharomyces cerevisiae / chemistry
  • Structure-Activity Relationship

Substances

  • Amino Acids
  • Proteins