Rational incorporation of any unnatural amino acid into proteins by machine learning on existing experimental proofs

Comput Struct Biotechnol J. 2022 Sep 5:20:4930-4941. doi: 10.1016/j.csbj.2022.08.063. eCollection 2022.

Abstract

The unnatural amino acid (UAA) incorporation technique through genetic code expansion has been extensively used in protein engineering for the last two decades. Mutations into UAAs offer more dimensions to tune protein structures and functions. However, the huge library of optional UAAs and various circumstances of mutation sites on different proteins urge rational UAA incorporations guided by artificial intelligence. Here we collected existing experimental proofs of UAA-incorporated proteins in literature and established a database of known UAA substitution sites. By program designing and machine learning on the database, we showed that UAA incorporations into proteins are predictable by the observed evolutional, steric and physiochemical factors. Based on the predicted probability of successful UAA substitutions, we tested the model performance using literature-reported and freshly-designed experimental proofs, and demonstrated its potential in screening UAA-incorporated proteins. This work expands structure-based computational biology and virtual screening to UAA-incorporated proteins, and offers a useful tool to automate the rational design of proteins with any UAA.

Keywords: Genetic code expansion; Machine learning; Protein design; Unnatural amino acid incorporation; Virtual screening.