An Interpretable Classification Model Using Gluten-Specific TCR Sequences Shows Diagnostic Potential in Coeliac Disease

Biomolecules. 2023 Nov 25;13(12):1707. doi: 10.3390/biom13121707.

Abstract

Coeliac disease (CeD) is a T-cell mediated enteropathy triggered by dietary gluten which remains substantially under-diagnosed around the world. The diagnostic gold-standard requires histological assessment of intestinal biopsies taken at endoscopy while consuming a gluten-containing diet. However, there is a lack of concordance between pathologists in histological assessment, and both endoscopy and gluten challenge are burdensome and unpleasant for patients. Identification of gluten-specific T-cell receptors (TCRs) in the TCR repertoire could provide a less subjective diagnostic test, and potentially remove the need to consume gluten. We review published gluten-specific TCR sequences, and develop an interpretable machine learning model to investigate their diagnostic potential. To investigate this, we sequenced the TCR repertoires of mucosal CD4+ T cells from 20 patients with and without CeD. These data were used as a training dataset to develop the model, then an independently published dataset of 20 patients was used as the testing dataset. We determined that this model has a training accuracy of 100% and testing accuracy of 80% for the diagnosis of CeD, including in patients on a gluten-free diet (GFD). We identified 20 CD4+ TCR sequences with the highest diagnostic potential for CeD. The sequences identified here have the potential to provide an objective diagnostic test for CeD, which does not require the consumption of gluten.

Keywords: T-cell repertoire; coeliac disease; gluten-free diet; machine learning; next generation sequencing.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Celiac Disease* / diagnosis
  • Diet
  • Glutens
  • Humans
  • Receptors, Antigen, T-Cell / genetics
  • T-Lymphocytes / pathology

Substances

  • Glutens
  • Receptors, Antigen, T-Cell