Selection of key genes for dilated cardiomyopathy based on machine learning algorithms and assessment of diagnostic accuracy

J Thorac Dis. 2023 Aug 31;15(8):4445-4455. doi: 10.21037/jtd-23-1086. Epub 2023 Aug 23.

Abstract

Background: The mechanisms of the occurrence and progression of dilated cardiomyopathy are still unclear and further exploration is needed. The upgrading of programming languages and the improvement of biological databases have created conditions for us to explore the structural and functional information of biological molecules at the nucleic acid and protein levels, screen key pathogenic genes, and elucidate pathogenic mechanisms. This study aimed to screen key pathogenic genes using machine learning algorithms and explore the correlation between key genes and immune microenvironment through transcriptome sequencing data sets of myocardial samples from patients with dilated cardiomyopathy, providing new ideas for elucidating the pathogenesis of the disease.

Methods: The transcriptome sequencing data sets of heart tissue from patients with dilated cardiomyopathy were downloaded from the Gene Expression Omnibus (GEO) database (GSE29819 and GSE21610). Differentially expressed genes (DEGs) were screened between pathological and normal tissues. The key genes were screened using least absolute shrinkage and selection operator (LASSO) regression analysis and random forest tree algorithms. The diagnostic efficiency of the key genes for the disease was evaluated using the receiver operating characteristic (ROC) curve.

Results: Compared with the normal heart tissue (control group) samples, there were 213 DEGs in the heart tissue samples of patients with dilated cardiomyopathy (treat group), including 101 upregulated and 102 downregulated genes. CCL5 and CTGF were highly expressed in the treat group compared to the control group. The ROC curve showed that the areas under the curve (AUCs) of CCL5 and CTGF were 0.821 and 0.902, respectively (P<0.05). In the treat group samples, CCL5 was positively correlated with the infiltration content of most immune cell subtypes.

Conclusions: CCL5 and CTGF are key disease-causing genes in dilated cardiomyopathy and have good diagnostic efficiency for the disease. CCL5 and CTGF may be related to immune cell enrichment and myocardial fibrosis, respectively.

Keywords: Dilated cardiomyopathy; immune microenvironment; machine learning algorithms.