Identification of key gene expression associated with quality of life after recovery from COVID-19

Med Biol Eng Comput. 2024 Apr;62(4):1031-1048. doi: 10.1007/s11517-023-02988-8. Epub 2023 Dec 21.

Abstract

Post-acute sequelae of COVID-19 (PASC) is a persistent complication of severe acute respiratory syndrome coronavirus 2 infection that includes symptoms, such as fatigue, cognitive impairment, and respiratory distress. These symptoms severely affect the quality of life of patients after their recovery from COVID-19. In this study, a group of machine learning algorithms analyzed the whole blood RNA-seq data from patients with different PASC levels. The purpose of this analysis was to identify the gene markers associated with PASC and the special expression patterns for different PASC levels. By comparing the quality of life of patients after the acute phase of COVID-19 and before the disease, samples in the dataset were divided into three groups, namely, "Better," "The Same," and "Worse." Each patient was represented by the expression levels of 58,929 genes. The machine learning-based workflow included six feature-ranking algorithms, incremental feature selection (IFS), and four classification algorithms. The feature ranking algorithms were in charge of assessing feature importance, whereas IFS with classification algorithms were used to extract essential genes and to construct efficient classifiers and classification rules. The expression of top genes in the results was associated with the immune response to viral infection, which is supported by the published literature. For example, patients with low CCDC18 expression and high CPED1 expression had good quality of life, whereas those with low CDC16 expression had poor quality of life.

Keywords: Post-acute sequelae of COVID-19; Severe acute respiratory syndrome coronavirus 2.

MeSH terms

  • Algorithms
  • COVID-19*
  • Cognitive Dysfunction*
  • Disease Progression
  • Gene Expression
  • Humans
  • Quality of Life