BERTrand-peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing

Bioinformatics. 2023 Aug 1;39(8):btad468. doi: 10.1093/bioinformatics/btad468.

Abstract

Motivation: The advent of T-cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide:TCR binding data available and a number of machine-learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides.

Results: We prepare the dataset of known peptide:TCR binders and augment it with negative decoys created using healthy donors' T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing to train part a peptide:TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training.

Availability and implementation: The datasets and the code for model training are available at https://github.com/SFGLab/bertrand.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Epitopes
  • Machine Learning
  • Peptides* / metabolism
  • Protein Binding
  • Receptors, Antigen, T-Cell* / metabolism

Substances

  • Peptides
  • Receptors, Antigen, T-Cell
  • Epitopes