Leveraging transformers-based language models in proteome bioinformatics

Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.

Abstract

In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.

Keywords: bioinformatics; deep learning; drug discovery; explainable artificial intelligence; natural language processing; protein expression; protein function prediction; transformer attention.

Publication types

  • Review

MeSH terms

  • Computational Biology*
  • Data Mining
  • Machine Learning
  • Natural Language Processing
  • Proteome*

Substances

  • Proteome