Identification of ferroptosis-related genes in ulcerative colitis: a diagnostic model with machine learning

Ann Transl Med. 2023 Feb 28;11(4):177. doi: 10.21037/atm-23-276.

Abstract

Background: Ulcerative colitis (UC) is an idiopathic, chronic disorder characterized by inflammation, injury, and disruption of the colonic mucosa. However, there are still many difficulties in the diagnosis and differential diagnosis of UC. An increasing amount of research has shown a connection between ferroptosis and the etiology of UC. Therefore, our study aimed to identify the key genes related to ferroptosis in UC to provide new ideas for diagnosis UC.

Methods: Gene expression profiles of normal and UC samples were extracted from the Gene Expression Omnibus (GEO) database. By combining differentially expressed genes (DEGs), Weighted correlation network analysis (WGCNA) genes, and ferroptosis-related genes, hub genes were identified and then screened using Lasso regression. Based on the key genes, gene ontology (GO) and gene set enrichment analysis (GSEA) analyses were performed. We used NaiveBeyas, Logistic, IBk, and RandomForest algorithms to build a disease diagnosis model using the hub genes. The model was validated using GSE87473 as the validation set.

Results: Gene expression matrices of GSE87466 and GSE75214 were downloaded from the GEO database, including 184 UC patients and 43 control samples. A total of 699 DEGs were obtained. From FerrDb, 565 genes related to ferroptosis were identified. The 1,513 genes with the highest absolute correlation coefficient value in the MEblue module were obtained from WGCNA analysis. Five hub genes (LCN2, MUC1, PARP8, PLIN2, and TIMP1) were identified using the Lasso regression algorithm based on the overlapped DEGs, WGCNA-identified genes, and ferroptosis-related genes. GO and GSEA analyses revealed that 5 hub genes were identified as being involved in the negative regulation of transcription by competitive promoter binding, cellular response to citrate cycle_tca_cycle, cytosolic_dna_sensing pathway, UV-A, and beta-alanine metabolism. The logistic algorithm's values of the area under the curve (AUC)were 1.000 and 0.995 for training and validation cohorts, and sensitivity is 0.962, specificity is 1.000, respectively, as determined by comparing various methods.

Conclusions: The previously described hub genes were identified as being intimately related to ferroptosis in UC and capable of distinguishing UC patients from controls. By detecting the expression of several genes, this model may aid in diagnosing UC and understanding the etiology and treatment of the disease.

Keywords: Ulcerative colitis; bioinformatic analysis; diagnostic model; ferroptosis.