A Novel 3-Gene Signature for Identifying COVID-19 Patients Based on Bioinformatics and Machine Learning

Genes (Basel). 2022 Sep 8;13(9):1602. doi: 10.3390/genes13091602.

Abstract

Although many biomarkers associated with coronavirus disease 2019 (COVID-19) were found, a novel signature relevant to immune cells has not been developed. In this work, the "CIBERSORT" algorithm was used to assess the fraction of immune infiltrating cells in GSE152641 and GSE171110. Key modules associated with important immune cells were selected by the "WGCNA" package. The "GO" enrichment analysis was used to reveal the biological function associated with COVID-19. The "Boruta" algorithm was used to screen candidate genes, and the "LASSO" algorithm was used for collinearity reduction. A novel gene signature was developed based on multivariate logistic regression analysis. Subsequently, M0 macrophages (PRAUC = 0.948 in GSE152641 and PRAUC = 0.981 in GSE171110) and neutrophils (PRAUC = 0.892 in GSE152641 and PRAUC = 0.960 in GSE171110) were considered as important immune cells. Forty-three intersected genes from two modules were selected, which mainly participated in some immune-related activities. Finally, a three-gene signature comprising CLEC4D, DUSP13, and UNC5A that can accurately distinguish COVID-19 patients and healthy controls in three datasets was constructed. The ROCAUC was 0.974 in the training set, 0.946 in the internal test set, and 0.709 in the external test set. In conclusion, we constructed a three-gene signature to identify COVID-19, and CLEC4D, DUSP13, and UNC5A may be potential biomarkers for COVID-19 patients.

Keywords: Boruta; CIBERSORT; GO; LASSO; WGCNA; biomarker; coronavirus disease 2019; multivariate logistic regression.

MeSH terms

  • COVID-19* / genetics
  • Computational Biology*
  • Humans
  • Machine Learning

Grants and funding

This research received no external funding.