Computational epitope binning reveals functional equivalence of sequence-divergent paratopes

Comput Struct Biotechnol J. 2022 Apr 30:20:2169-2180. doi: 10.1016/j.csbj.2022.04.036. eCollection 2022.

Abstract

The therapeutic efficacy of a protein binder largely depends on two factors: its binding site and its binding affinity. Advances in in vitro library display screening and next-generation sequencing have enabled accelerated development of strong binders, yet identifying their binding sites still remains a major challenge. The differentiation, or "binning", of binders into different groups that recognize distinct binding sites on their target is a promising approach that facilitates high-throughput screening of binders that may show different biological activity. Here we study the extent to which the information contained in the amino acid sequences comprising a set of target-specific binders can be leveraged to bin them, inferring functional equivalence of their binding regions, or paratopes, based directly on comparison of the sequences, their modeled structures, or their modeled interactions. Using a leucine-rich repeat binding scaffold known as a "repebody" as the source of diversity in recognition against interleukin-6 (IL-6), we show that the "Epibin" approach introduced here effectively utilized structural modelling and docking to extract specificity information encoded in the repebody amino acid sequences and thereby successfully recapitulate IL-6 binding competition observed in immunoassays. Furthermore, our computational binning provided a basis for designing in vitro mutagenesis experiments to pinpoint specificity-determining residues. Finally, we demonstrate that the Epibin approach can extend to antibodies, retrospectively comparing its predictions to results from antigen-specific antibody competition studies. The study thus demonstrates the utility of modeling structure and binding from the amino acid sequences of different binders against the same target, and paves the way for larger-scale binning and analysis of entire repertoires.

Keywords: AU-PRC, Area under the precision-recall curve; Docking; Epitope; Epitope binning; IL-6, Interleukin - 6; LRR, leucine-rich repeat; PCC, Pearson correlation coefficient; Paratope equivalence; Pro, Proline; Protein binder; RMSD, Root-mean squared deviation; Repebody; SARS-CoV-2, severe acute respiratory syndrome coronavirus – 2.