Automatic Gene Function Prediction in the 2020's

Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.

Abstract

The current rate at which new DNA and protein sequences are being generated is too fast to experimentally discover the functions of those sequences, emphasizing the need for accurate Automatic Function Prediction (AFP) methods. AFP has been an active and growing research field for decades and has made considerable progress in that time. However, it is certainly not solved. In this paper, we describe challenges that the AFP field still has to overcome in the future to increase its applicability. The challenges we consider are how to: (1) include condition-specific functional annotation, (2) predict functions for non-model species, (3) include new informative data sources, (4) deal with the biases of Gene Ontology (GO) annotations, and (5) maximally exploit the GO to obtain performance gains. We also provide recommendations for addressing those challenges, by adapting (1) the way we represent proteins and genes, (2) the way we represent gene functions, and (3) the algorithms that perform the prediction from gene to function. Together, we show that AFP is still a vibrant research area that can benefit from continuing advances in machine learning with which AFP in the 2020s can again take a large step forward reinforcing the power of computational biology.

Keywords: Gene Ontology; automatic function prediction; machine learning; protein representation.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Amino Acid Sequence / genetics
  • Computational Biology / methods*
  • Electronic Data Processing
  • Gene Ontology*
  • Machine Learning
  • Models, Biological
  • Molecular Sequence Annotation / methods*
  • Proteins / genetics
  • Proteins / metabolism*

Substances

  • Proteins