Identification and validation of cuproptosis related genes and signature markers in bronchopulmonary dysplasia disease using bioinformatics analysis and machine learning

BMC Med Inform Decis Mak. 2023 Apr 14;23(1):69. doi: 10.1186/s12911-023-02163-x.

Abstract

Background: Bronchopulmonary Dysplasia (BPD) has a high incidence and affects the health of preterm infants. Cuproptosis is a novel form of cell death, but its mechanism of action in the disease is not yet clear. Machine learning, the latest tool for the analysis of biological samples, is still relatively rarely used for in-depth analysis and prediction of diseases.

Methods and results: First, the differential expression of cuproptosis-related genes (CRGs) in the GSE108754 dataset was extracted and the heat map showed that the expression of NFE2L2 gene was significantly higher in the control group whereas the expression of GLS gene was significantly higher in the treatment group. Chromosome location analysis showed that both the genes were positively correlated and associated with chromosome 2. The results of immune infiltration and immune cell differential analysis showed differences in the four immune cells, significantly in Monocytes cells. Five new pathways were analyzed through two subgroups based on consistent clustering of CRG expression. Weighted correlation network analysis (WGCNA) set the screening condition to the top 25% to obtain the disease signature genes. Four machine learning algorithms: Generalized Linear Models (GLM), Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB) were used to screen the disease signature genes, and the final five marker genes for disease prediction. The models constructed by GLM method were proved to be more accurate in the validation of two datasets, GSE190215 and GSE188944.

Conclusion: We eventually identified two copper death-associated genes, NFE2L2 and GLS. A machine learning model-GLM was constructed to predict the prevalence of BPD disease, and five disease signature genes NFATC3, ERMN, PLA2G4A, MTMR9LP and LOC440700 were identified. These genes that were bioinformatics analyzed could be potential targets for identifying BPD disease and treatment.

Keywords: Bioinformatics analysis; Biomarkers; Bronchopulmonary dysplasia disease; Cuproptosis; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Apoptosis*
  • Bronchopulmonary Dysplasia* / genetics
  • Cluster Analysis
  • Computational Biology
  • Copper
  • Humans
  • Infant
  • Infant, Newborn
  • Infant, Premature

Substances

  • Copper