Construction and validation of a machine learning model for the diagnosis of juvenile idiopathic arthritis based on fecal microbiota

Front Cell Infect Microbiol. 2024 Mar 8:14:1371371. doi: 10.3389/fcimb.2024.1371371. eCollection 2024.

Abstract

Purpose: Human gut microbiota has been shown to be significantly associated with various inflammatory diseases. Therefore, this study aimed to develop an excellent auxiliary tool for the diagnosis of juvenile idiopathic arthritis (JIA) based on fecal microbial biomarkers.

Method: The fecal metagenomic sequencing data associated with JIA were extracted from NCBI, and the sequencing data were transformed into the relative abundance of microorganisms by professional data cleaning (KneadData, Trimmomatic and Bowtie2) and comparison software (Kraken2 and Bracken). After that, the fecal microbes with high abundance were extracted for subsequent analysis. The extracted fecal microbes were further screened by least absolute shrinkage and selection operator (LASSO) regression, and the selected fecal microbe biomarkers were used for model training. In this study, we constructed six different machine learning (ML) models, and then selected the best model for constructing a JIA diagnostic tool by comparing the performance of the models based on a combined consideration of area under receiver operating characteristic curve (AUC), accuracy, specificity, F1 score, calibration curves and clinical decision curves. In addition, to further explain the model, Permutation Importance analysis and Shapley Additive Explanations (SHAP) were performed to understand the contribution of each biomarker in the prediction process.

Result: A total of 231 individuals were included in this study, including 203 JIA patients and Non-JIA individuals. In the analysis of diversity at the genus level, the alpha diversity represented by Shannon value was not significantly different between the two groups, while the belt diversity was slightly different. After selection by LASSO regression, 10 fecal microbe biomarkers were selected for model training. By comparing six different models, the XGB model showed the best performance, which average AUC, accuracy and F1 score were 0.976, 0.914 and 0.952, respectively, thus being used to construct the final JIA diagnosis model.

Conclusion: A JIA diagnosis model based on XGB algorithm was constructed with excellent performance, which may assist physicians in early detection of JIA patients and improve the prognosis of JIA patients.

Keywords: XGB algorithm; diagnosis; fecal microbes; juvenile idiopathic arthritis; machine learning.

MeSH terms

  • Arthritis, Juvenile* / diagnosis
  • Arthritis, Juvenile* / genetics
  • Biomarkers
  • Humans
  • Machine Learning
  • Microbiota*
  • ROC Curve

Substances

  • Biomarkers

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.