Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

Man-Ling Lee; Sherif Farag; Joselyn S Del Cid; Charlene Bashore; Kenneth K Hallenbeck; Alberto Gobbi; Christian N Cunningham

doi:10.1021/acschembio.3c00159

Identification of Macrocyclic Peptide Families from Combinatorial Libraries Containing Noncanonical Amino Acids Using Cheminformatics and Bioinformatics Inspired Clustering

ACS Chem Biol. 2023 Jun 16;18(6):1425-1434. doi: 10.1021/acschembio.3c00159. Epub 2023 May 23.

Authors

Man-Ling Lee¹, Sherif Farag¹, Joselyn S Del Cid², Charlene Bashore³, Kenneth K Hallenbeck², Alberto Gobbi¹, Christian N Cunningham²

Affiliations

¹ Discovery Chemistry, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
² Peptide Therapeutics, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States.
³ Biological Chemistry, Genentech Inc. 1 DNA Way, South San Francisco, California 94080, United States.

Abstract

In the past decade, macrocyclic peptides gained increasing interest as a new therapeutic modality to tackle intracellular and extracellular therapeutic targets that had been previously classified as "undruggable". Several technological advances have made discovering macrocyclic peptides against these targets possible: 1) the inclusion of noncanonical amino acids (NCAAs) into mRNA display, 2) increased availability of next generation sequencing (NGS), and 3) improvements in rapid peptide synthesis platforms. This type of directed-evolution based screening can produce large numbers of potential hit sequences given that DNA sequencing is the functional output of this platform. The current standard for selecting hit peptides from these selections for downstream follow-up relies on the frequency counting and sorting of unique peptide sequences which can result in the generation of false negatives due to technical reasons including low translation efficiency or other experimental factors. To overcome our inability to detect weakly enriched peptide sequences among our large data sets, we wanted to develop a clustering method that would enable the identification of peptide families. Unfortunately, utilizing traditional clustering algorithms, such as ClustalW, is not possible for this technology due to the incorporation of NCAAs in these libraries. Therefore, we developed a new atomistic clustering method with a Pairwise Aligned Peptide (PAP) chemical similarity metric to perform sequence alignments and identify macrocyclic peptide families. With this method, low enriched peptides, including isolated sequences (singletons), can now be clustered into families providing a comprehensive analysis of NGS data resulting from macrocycle discovery selections. Additionally, upon identification of a hit peptide with the desired activity, this clustering algorithm can be used to identify derivatives from the initial data set for structure-activity relationship (SAR) analysis without requiring additional selection experiments.

MeSH terms

Amino Acids* / genetics
Cheminformatics*
Cluster Analysis
Computational Biology
Humans
Peptide Library
Peptides / chemistry

Substances

Amino Acids
Peptides
Peptide Library