Protein function prediction with gene ontology: from traditional to deep learning models

PeerJ. 2021 Aug 24:9:e12019. doi: 10.7717/peerj.12019. eCollection 2021.

Abstract

Protein function prediction is a crucial part of genome annotation. Prediction methods have recently witnessed rapid development, owing to the emergence of high-throughput sequencing technologies. Among the available databases for identifying protein function terms, Gene Ontology (GO) is an important resource that describes the functional properties of proteins. Researchers are employing various approaches to efficiently predict the GO terms. Meanwhile, deep learning, a fast-evolving discipline in data-driven approach, exhibits impressive potential with respect to assigning GO terms to amino acid sequences. Herein, we reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach. Further, we selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset. Finally, we discussed the remaining major challenges in the field, and emphasized the future directions for protein function prediction with GO.

Keywords: Annotation; CAFA3; Deep learning; Gene Ontology; Machine learning; Protein function prediction.

Associated data

  • figshare/10.6084/m9.figshare.8135393.v3

Grants and funding

This research was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2019R1A2C1084308). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.