Identification of clinical heterogeneity and construction of a novel subtype predictive model in patients with ankylosing spondylitis: An unsupervised machine learning study

Int Immunopharmacol. 2023 Apr:117:109879. doi: 10.1016/j.intimp.2023.109879. Epub 2023 Feb 21.

Abstract

Background: Accurate classification of patients with ankylosing spondylitis (AS) is the premise of precision medicine so as to perform different medical interventions for different patient types. AS pathology is closely related to the changes in the immune microenvironment. In this study, we used unsupervised machine learning (UML) to classify patients with AS based on clinical characteristics. We then constructed a novel subtype predictive model for AS based on the clinical classification, after which we investigated the difference in the immune microenvironment to unravel the AS pathogenesis.

Methods: Overall, 196 patients with AS were enrolled. UML was used to cluster AS patients by similar clinical characteristics. Functional ability, disease status, and grading of radiologic features were assessed to verify the accuracy and heterogeneity of UML clustering. Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest algorithm were used to screen and identify predictive factors for the novel subtype of AS. Logistic regression was also performed to construct a predictive model of this novel subtype. Datasets were downloaded from the Gene Expression Omnibus database to assess immune cell infiltration, and the results were validated using data of routine blood tests from 3671 AS patients and 5720 non-AS patients. The differential expression of Fat Mass and Obesity-Associated Protein (FTO), an m6A regulator, between AS patients and healthy control subjects was confirmed using immunohistochemistry.

Results: UML clustering identified two clusters. The clinical characteristics of the two clusters were significantly heterogeneous. For the novel subtype of AS identified in UML clustering, a predictive model was built using three predictive factors, namely, C-reactive protein (CRP), absolute value of neutrophils (NEU), and absolute value of monocytes (MONO). The area under the curve of the predictive model was 0.983. Heterogeneity in the neutrophil and monocyte counts in AS was verified through immune cell infiltration analysis. Data from routine blood tests revealed that NEU and MONO were significantly higher in AS patients than in non-AS patients (p < 0.001). FTO expression was negatively correlated with both NEU and MONO. Immunohistochemistry analysis confirmed the downregulated expression of FTO.

Conclusions: UML provides an explicable and remarkable classification of a heterogeneous cohort of AS patients. A novel subtype of AS was identified in UML clustering. CRP, NEU, and MONO were the independent predictive factors for the novel subtype of AS. FTO expression was correlated with immune cell infiltration in AS patients.

Keywords: Ankylosing spondylitis; Immune cell infiltration; Immunohistochemistry; Machine learning; Predictive model.

MeSH terms

  • Alpha-Ketoglutarate-Dependent Dioxygenase FTO
  • C-Reactive Protein
  • Cluster Analysis
  • Databases, Factual
  • Humans
  • Spondylitis, Ankylosing* / genetics
  • Unsupervised Machine Learning

Substances

  • C-Reactive Protein
  • FTO protein, human
  • Alpha-Ketoglutarate-Dependent Dioxygenase FTO