Identification of Immune Cell Landscape and Construction of a Novel Diagnostic Nomogram for Crohn's Disease

Front Genet. 2020 Apr 29:11:423. doi: 10.3389/fgene.2020.00423. eCollection 2020.

Abstract

Crohn's disease (CD) has an increasing incidence and prevalence worldwide. The etiology of CD remains unclear and there is no gold standard for diagnosis. The dysregulated immune response and different infiltration status of immune cells are critical for CD pathogenesis; therefore, it is important to provide an overview of immune-cell alterations in CD and explore a novel method for auxiliary diagnosis. Here we analyzed microarray datasets from Gene Expression Omnibus (GEO), and an extended version of Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORTx) was utilized to estimate the fraction of 22 types of immune cells. Differentially expressed genes (DEGs) and a protein-protein interaction (PPI) network were identified, and we performed gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) to identify differentially regulated pathways in CD. Least absolute shrinkage and selection operator (LASSO) regression was conducted to filter features, and a diagnostic nomogram based on logistic regression was built and validated in an independent validation cohort. In the derivation cohort, we found a proportion of 17 immune-cell types to be significantly altered between CD and healthy controls and a total of 150 DEGs were identified, which were mostly related to the immune response. Among the 15 hub genes based on the PPI network, C-X-C chemokine ligand 8 (CXCL8) and interleukin-1B (IL-1B) showed the highest degree of interaction. Additionally, GSEA and GSVA identified five significantly enriched pathways, among which the nucleotide-binding oligomerization domain (NOD)-like receptor signaling pathway was critical in the CD development. Furthermore, six variables comprising of CXCL8, IL-1B, M1 macrophages, regulatory T cells, CD8+ T cells, and plasma cells were identified by LASSO regression and incorporated into a logistic regression model. The nomogram displayed a good prediction, with a 0.915 area under the receiver operating curve (AUC) and a C-index of 0.915 [95% confidence interval (CI): 0.875-0.955]. Similar results were found in the validation cohort, with an AUC of 0.884 and a 0.884 C-index (95% CI: 0.843-0.924). These results provide novel in silico insight into cellular and molecular characteristics of CD and potential biomarkers for diagnosis and targeted therapy.

Keywords: CIBERSORTx; Crohn’s disease; GSVA; immune cells; inflammatory bowel diseases; nomogram.