Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models

Proteomics. 2023 Dec;23(23-24):e2200494. doi: 10.1002/pmic.202200494. Epub 2023 Oct 20.

Abstract

Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.

Keywords: machine learning; membrane proteins; pre-trained language model.

Publication types

  • Review

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods
  • Horses
  • Membrane Proteins*
  • Sequence Alignment
  • Sequence Analysis, Protein

Substances

  • Membrane Proteins