Identification of diagnostic signatures in ulcerative colitis patients via bioinformatic analysis integrated with machine learning

Hum Cell. 2022 Jan;35(1):179-188. doi: 10.1007/s13577-021-00641-w. Epub 2021 Nov 3.

Abstract

Ulcerative colitis (UC) is an immune-related disorder with enhanced prevalence globally. Early diagnosis is critical for the effective treatment of UC. However, it still lacks specific diagnostic signatures. The aim of our study was to explore efficient signatures and construct the diagnostic model for UC. Microarray data of GSE87473 and GSE48634, which were obtained from tissue biopsy samples, were downloaded from the Gene Expression Omnibus (GEO), and differently expressed genes (DEGs), GO, and KEGG analyses were performed. We constructed the PPI network via STRING database. The immune infiltration of the samples was evaluated using CIBERSORT methods combined with the LM22 feature matrix. The logistic regression model was constructed, with the expression of selected genes as the predictor variable, and the UC occurrence as the responsive variable. As a result, a total of 126 DEGs between the UC patients and normal counterparts were identified. The GO and KEGG analysis revealed that multiple biological processes, such as antimicrobial humoral immune response mediated by antimicrobial peptide and IL-17 signaling pathway, were enriched. The infiltration of eight immune cell types (B cells naive, Dendritic.cells.activated, Macrophages.M0, Macrophages.M2, Mast.cells.resting, Neutrophils, Plasma.cells, and T.cells.follicular.helper) was significantly different between patients with UC and normal counterparts. The top 50 most significant DEGs were selected for the construction of the PPI network. The average AUC of the logistic regression model in the fivefold cross-validation was 0.8497 in the training set, GSE87473. The AUC of another independent verification set of GSE48634 from the GEO database was 0.7208. In conclusion, we identified potential hub genes, including REG3A, REG1A, DEFA6, REG1B, and DEFA5, which might be significantly associated with UC progression. The logistic regression model based on the five genes could reliably diagnose UC patients.

Keywords: Bioinformatic analysis; Diagnosis; Differently expressed genes; Logistic regression; Ulcerative colitis.

MeSH terms

  • Colitis, Ulcerative / diagnosis*
  • Colitis, Ulcerative / genetics*
  • Computational Biology / methods*
  • Gene Expression / genetics
  • Genetic Association Studies / methods*
  • Humans
  • Lithostathine
  • Logistic Models
  • Machine Learning*
  • Pancreatitis-Associated Proteins
  • alpha-Defensins

Substances

  • DEFA5 protein, human
  • DEFA6 protein, human
  • Lithostathine
  • Pancreatitis-Associated Proteins
  • REG1A protein, human
  • REG1B protein, human
  • REG3A protein, human
  • alpha-Defensins