Weighted Gene Co-expression Network Analysis and Machine Learning Validation for Identifying Major Genes Related to Sjogren's Syndrome

Biochem Genet. 2024 Apr 28. doi: 10.1007/s10528-024-10750-4. Online ahead of print.

Abstract

Sjogren's syndrome (SS) is an autoimmune disorder characterized by dry mouth and dry eyes. Its pathogenic mechanism is currently unclear. This study aims to integrate weighted gene co-expression network analysis (WGCNA) and machine learning to identify key genes associated with SS. We downloaded 3 publicly available datasets from the GEO database comprising the gene expression data of 231 SS and 78 control cases, including GSE84844, GSE48378 and GSE51092, and carried out WGCNA to elucidate differences in the abundant genes. Candidate biomarkers for SS were then identified using a LASSO regression model. Totally 6 machine-learning models were subsequently utilized for validating the biological significance of major genes according to their expression. Finally, immune cell infiltration of the SS tissue was assessed using the CIBERSORT algorithm. A weighted gene co-expression network was built to divide genes into 10 modules. Among them, blue and red modules were most closely associated with SS, and showed significant enrichment in type I interferon signaling, cellular response to type I interferon and response to virus, etc. Combined machine learning identified 5 hub genes, including OAS1, EIF2AK2, IFITM3, TOP2A and STAT1. Immune cell infiltration analysis showed that SS was associated with CD8+ T cell, CD4+ T cell, gamma delta T cell, NK cell and dendritic cell activation. WGCNA was combined with machine learning to uncover genes that may be involved in SS pathogenesis, which can be utilized for developing SS biomarkers and appropriate therapeutic targets.

Keywords: Machine learning; Sjogren’s syndrome; Weighted gene co-expression network analysis.