Attention-aware contrastive learning for predicting T cell receptor-antigen binding specificity

Brief Bioinform. 2022 Nov 19;23(6):bbac378. doi: 10.1093/bib/bbac378.

Abstract

Motivation: It has been proven that only a small fraction of the neoantigens presented by major histocompatibility complex (MHC) class I molecules on the cell surface can elicit T cells. This restriction can be attributed to the binding specificity of T cell receptor (TCR) and peptide-MHC complex (pMHC). Computational prediction of T cells binding to neoantigens is a challenging and unresolved task.

Results: In this paper, we proposed an attention-aware contrastive learning model, ATMTCR, to infer the TCR-pMHC binding specificity. For each TCR sequence, we used a transformer encoder to transform it to latent representation, and then masked a percentage of amino acids guided by attention weights to generate its contrastive view. Compared to fully-supervised baseline model, we verified that contrastive learning-based pretraining on large-scale TCR sequences significantly improved the prediction performance of downstream tasks. Interestingly, masking a percentage of amino acids with low attention weights yielded best performance compared to other masking strategies. Comparison experiments on two independent datasets demonstrated our method achieved better performance than other existing algorithms. Moreover, we identified important amino acids and their positional preference through attention weights, which indicated the potential interpretability of our proposed model.

Keywords: T cell receptor; TCR–antigen binding; attention mechanism; contrastive learning; neoantigen.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • Attention
  • HLA Antigens
  • Histocompatibility Antigens Class I / metabolism
  • Protein Binding
  • Receptors, Antigen, T-Cell*
  • T-Lymphocytes*

Substances

  • Receptors, Antigen, T-Cell
  • Histocompatibility Antigens Class I
  • HLA Antigens
  • Amino Acids