Protein Design with Deep Learning

Int J Mol Sci. 2021 Oct 29;22(21):11741. doi: 10.3390/ijms222111741.

Abstract

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.

Keywords: artificial neural network; computational protein design; deep learning; generative models; inverse folding problem; language models; protein structure.

Publication types

  • Review

MeSH terms

  • Computational Biology*
  • Deep Learning*
  • Protein Domains
  • Protein Engineering*
  • Proteins* / chemistry
  • Proteins* / genetics

Substances

  • Proteins