Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications

Dawid Chomicz; Jarosław Kończak; Sonia Wróbel; Tadeusz Satława; Paweł Dudzic; Bartosz Janusz; Mateusz Tarkowski; Piotr Deszyński; Tomasz Gawłowski; Anna Kostyn; Marek Orłowski; Tomasz Klaus; Lukas Schulte; Kyle Martin; Stephen R Comeau; Konrad Krawczyk

doi:10.3389/fmolb.2024.1352508

Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications

Front Mol Biosci. 2024 Mar 28:11:1352508. doi: 10.3389/fmolb.2024.1352508. eCollection 2024.

Authors

Dawid Chomicz¹, Jarosław Kończak¹, Sonia Wróbel¹, Tadeusz Satława¹, Paweł Dudzic¹, Bartosz Janusz¹, Mateusz Tarkowski¹, Piotr Deszyński¹, Tomasz Gawłowski¹, Anna Kostyn², Marek Orłowski^{2

3}, Tomasz Klaus², Lukas Schulte⁴, Kyle Martin⁵, Stephen R Comeau⁵, Konrad Krawczyk¹

Affiliations

¹ NaturalAntibody, Szczecin, West Pomeranian, Poland.
² Pure Biologics, Wrocław, Poland.
³ Department of Biochemistry, Molecular Biology and Biotechnology, Faculty of Chemistry, Wrocław University of Science and Technology, Wrocław, Poland.
⁴ Global Computational Biology & Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany.
⁵ Biotherapeutics Discovery, Boehringer Ingelheim, Biberach, Germany.

Abstract

Antibodies are proteins produced by our immune system that have been harnessed as biotherapeutics. The discovery of antibody-based therapeutics relies on analyzing large volumes of diverse sequences coming from phage display or animal immunizations. Identification of suitable therapeutic candidates is achieved by grouping the sequences by their similarity and subsequent selection of a diverse set of antibodies for further tests. Such groupings are typically created using sequence-similarity measures alone. Maximizing diversity in selected candidates is crucial to reducing the number of tests of molecules with near-identical properties. With the advances in structural modeling and machine learning, antibodies can now be grouped across other diversity dimensions, such as predicted paratopes or three-dimensional structures. Here we benchmarked antibody grouping methods using clonotype, sequence, paratope prediction, structure prediction, and embedding information. The results were benchmarked on two tasks: binder detection and epitope mapping. We demonstrate that on binder detection no method appears to outperform the others, while on epitope mapping, clonotype, paratope, and embedding clusterings are top performers. Most importantly, all the methods propose orthogonal groupings, offering more diverse pools of candidates when using multiple methods than any single method alone. To facilitate exploring the diversity of antibodies using different methods, we have created an online tool-CLAP-available at (clap.naturalantibody.com) that allows users to group, contrast, and visualize antibodies using the different grouping methods.

Keywords: antibodies; biologics and biosimilars; clustering; drug discovery; language models (LMs); machine learning.

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.