Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency

Front Immunol. 2023 Apr 18:14:1128326. doi: 10.3389/fimmu.2023.1128326. eCollection 2023.

Abstract

The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.

Keywords: MHC; T-cell receptor (TCR); TCR specificity prediction; machine learning/deep learning; peptide.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Histocompatibility Antigens
  • Major Histocompatibility Complex
  • Peptides*
  • Protein Binding
  • Receptors, Antigen, T-Cell*

Substances

  • Peptides
  • Receptors, Antigen, T-Cell
  • Histocompatibility Antigens

Grants and funding

This work was funded by grants from the Deutsche Forschungsgemeinschaft (DFG), grants CRC1192, project number 264599542 and PR727/14-1, project number 497674564. IP is funded by DFG FOR2799. SB, YZ and CL are funded by SFB 1192 projects B8 and C3, FOR 5068 P9, as well as by the 3R reduction of animal testing initiative of the UKE.