Finding functional motifs in protein sequences with deep learning and natural language models

Curr Opin Struct Biol. 2023 Aug:81:102641. doi: 10.1016/j.sbi.2023.102641. Epub 2023 Jun 28.

Abstract

Recently, prediction of structural/functional motifs in protein sequences takes advantage of powerful machine learning based approaches. Protein encoding adopts protein language models overpassing standard procedures. Different combinations of machine learning and encoding schemas are available for predicting different structural/functional motifs. Particularly interesting is the adoption of protein language models to encode proteins in addition to evolution information and physicochemical parameters. A thorough analysis of recent predictors developed for annotating transmembrane regions, sorting signals, lipidation and phosphorylation sites allows to investigate the state-of-the-art focusing on the relevance of protein language models for the different tasks. This highlights that more experimental data are necessary to exploit available powerful machine learning methods.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Deep Learning*
  • Machine Learning
  • Proteins

Substances

  • Proteins