Integrative analysis identifies oxidative stress biomarkers in non-alcoholic fatty liver disease via machine learning and weighted gene co-expression network analysis

Front Immunol. 2024 Feb 27:15:1335112. doi: 10.3389/fimmu.2024.1335112. eCollection 2024.

Abstract

Background: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease globally, with the potential to progress to non-alcoholic steatohepatitis (NASH), cirrhosis, and even hepatocellular carcinoma. Given the absence of effective treatments to halt its progression, novel molecular approaches to the NAFLD diagnosis and treatment are of paramount importance.

Methods: Firstly, we downloaded oxidative stress-related genes from the GeneCards database and retrieved NAFLD-related datasets from the GEO database. Using the Limma R package and WGCNA, we identified differentially expressed genes closely associated with NAFLD. In our study, we identified 31 intersection genes by analyzing the intersection among oxidative stress-related genes, NAFLD-related genes, and genes closely associated with NAFLD as identified through Weighted Gene Co-expression Network Analysis (WGCNA). In a study of 31 intersection genes between NAFLD and Oxidative Stress (OS), we identified three hub genes using three machine learning algorithms: Least Absolute Shrinkage and Selection Operator (LASSO) regression, Support Vector Machine - Recursive Feature Elimination (SVM-RFE), and RandomForest. Subsequently, a nomogram was utilized to predict the incidence of NAFLD. The CIBERSORT algorithm was employed for immune infiltration analysis, single sample Gene Set Enrichment Analysis (ssGSEA) for functional enrichment analysis, and Protein-Protein Interaction (PPI) networks to explore the relationships between the three hub genes and other intersecting genes of NAFLD and OS. The distribution of these three hub genes across six cell clusters was determined using single-cell RNA sequencing. Finally, utilizing relevant data from the Attie Lab Diabetes Database, and liver tissues from NASH mouse model, Western Blot (WB) and Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) assays were conducted, this further validated the significant roles of CDKN1B and TFAM in NAFLD.

Results: In the course of this research, we identified 31 genes with a strong association with oxidative stress in NAFLD. Subsequent machine learning analysis and external validation pinpointed two genes: CDKN1B and TFAM, as demonstrating the closest correlation to oxidative stress in NAFLD.

Conclusion: This investigation found two hub genes that hold potential as novel targets for the diagnosis and treatment of NAFLD, thereby offering innovative perspectives for its clinical management.

Keywords: CDKN1B; NDUFA4; TFAM; WGCNA; bioinformatic analysis; machine learning; non-alcoholic fatty liver disease.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Biomarkers
  • Gene Expression Profiling
  • Genes, cdc
  • Liver Neoplasms*
  • Mice
  • Non-alcoholic Fatty Liver Disease*

Substances

  • Biomarkers

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was funded by Jiangsu medical scientific research project of Jiangsu Health Commission, Jiangsu Province Capability Improvement Project through Science, Technology and Education, Jiangsu Provincial Medical Key Discipline Cultivation Unit (JSDW202235), the National Natural Science Foundation of China (grant numbers 81870409), the 789 Outstanding Talent Program of SAHNMU (789ZYRC202070102), China Postdoctoral Science Foundation (2023M730675), and Shanghai Sailing Program (23YF1406800).