BERTrand-peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing

Alexander Myronov; Giovanni Mazzocco; Paulina Król; Dariusz Plewczynski

doi:10.1093/bioinformatics/btad468

BERTrand-peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing

Bioinformatics. 2023 Aug 1;39(8):btad468. doi: 10.1093/bioinformatics/btad468.

Authors

Alexander Myronov^{1

2}, Giovanni Mazzocco², Paulina Król², Dariusz Plewczynski¹

Affiliations

¹ Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
² Ardigen, Krakow, Poland.

Abstract

Motivation: The advent of T-cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide:TCR binding data available and a number of machine-learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides.

Results: We prepare the dataset of known peptide:TCR binders and augment it with negative decoys created using healthy donors' T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing to train part a peptide:TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training.

Availability and implementation: The datasets and the code for model training are available at https://github.com/SFGLab/bertrand.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Epitopes
Machine Learning
Peptides* / metabolism
Protein Binding
Receptors, Antigen, T-Cell* / metabolism

Substances

Peptides
Receptors, Antigen, T-Cell
Epitopes