Leveraging transformers-based language models in proteome bioinformatics

Nguyen Quoc Khanh Le

doi:10.1002/pmic.202300011

Leveraging transformers-based language models in proteome bioinformatics

Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.

Author

Nguyen Quoc Khanh Le^{1

2

3

4}

Affiliations

¹ Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
² AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan.
³ Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan.
⁴ Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan.

PMID: 37381841
DOI: 10.1002/pmic.202300011

Abstract

In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.

Keywords: bioinformatics; deep learning; drug discovery; explainable artificial intelligence; natural language processing; protein expression; protein function prediction; transformer attention.

Publication types

Review

MeSH terms

Computational Biology*
Data Mining
Machine Learning
Natural Language Processing
Proteome*

Substances

Proteome

Abstract

Publication types

MeSH terms

Substances

Grants and funding