Identifying COVID-19-Specific Transcriptomic Biomarkers with Machine Learning Methods

Biomed Res Int. 2021 Jul 6:2021:9939134. doi: 10.1155/2021/9939134. eCollection 2021.

Abstract

COVID-19, a severe respiratory disease caused by a new type of coronavirus SARS-CoV-2, has been spreading all over the world. Patients infected with SARS-CoV-2 may have no pathogenic symptoms, i.e., presymptomatic patients and asymptomatic patients. Both patients could further spread the virus to other susceptible people, thereby making the control of COVID-19 difficult. The two major challenges for COVID-19 diagnosis at present are as follows: (1) patients could share similar symptoms with other respiratory infections, and (2) patients may not have any symptoms but could still spread the virus. Therefore, new biomarkers at different omics levels are required for the large-scale screening and diagnosis of COVID-19. Although some initial analyses could identify a group of candidate gene biomarkers for COVID-19, the previous work still could not identify biomarkers capable for clinical use in COVID-19, which requires disease-specific diagnosis compared with other multiple infectious diseases. As an extension of the previous study, optimized machine learning models were applied in the present study to identify some specific qualitative host biomarkers associated with COVID-19 infection on the basis of a publicly released transcriptomic dataset, which included healthy controls and patients with bacterial infection, influenza, COVID-19, and other kinds of coronavirus. This dataset was first analysed by Boruta, Max-Relevance and Min-Redundancy feature selection methods one by one, resulting in a feature list. This list was fed into the incremental feature selection method, incorporating one of the classification algorithms to extract essential biomarkers and build efficient classifiers and classification rules. The capacity of these findings to distinguish COVID-19 with other similar respiratory infectious diseases at the transcriptomic level was also validated, which may improve the efficacy and accuracy of COVID-19 diagnosis.

Publication types

  • Retracted Publication

MeSH terms

  • Biomarkers / analysis
  • COVID-19 / blood
  • COVID-19 / diagnosis*
  • COVID-19 / genetics*
  • COVID-19 Testing / methods*
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Humans
  • Influenza, Human
  • Machine Learning
  • Mass Screening / methods
  • Models, Theoretical
  • Respiratory Tract Infections / blood
  • Respiratory Tract Infections / diagnosis
  • SARS-CoV-2 / genetics
  • SARS-CoV-2 / pathogenicity
  • Transcriptome / genetics

Substances

  • Biomarkers