Graphormer supervised de novo protein design method and function validation

Brief Bioinform. 2024 Mar 27;25(3):bbae135. doi: 10.1093/bib/bbae135.

Abstract

Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.

Keywords: GPD model; Graphormer architecture; function validation; protein sequence design.

MeSH terms

  • Amino Acid Sequence
  • Protein Engineering* / methods
  • Proteins* / chemistry

Substances

  • Proteins