Combining Deep Phenotyping of Serum Proteomics and Clinical Data via Machine Learning for COVID-19 Biomarker Discovery

Int J Mol Sci. 2022 Aug 15;23(16):9161. doi: 10.3390/ijms23169161.

Abstract

The persistence of long-term coronavirus-induced disease 2019 (COVID-19) sequelae demands better insights into its natural history. Therefore, it is crucial to discover the biomarkers of disease outcome to improve clinical practice. In this study, 160 COVID-19 patients were enrolled, of whom 80 had a "non-severe" and 80 had a "severe" outcome. Sera were analyzed by proximity extension assay (PEA) to assess 274 unique proteins associated with inflammation, cardiometabolic, and neurologic diseases. The main clinical and hematochemical data associated with disease outcome were grouped with serological data to form a dataset for the supervised machine learning techniques. We identified nine proteins (i.e., CD200R1, MCP1, MCP3, IL6, LTBP2, MATN3, TRANCE, α2-MRAP, and KIT) that contributed to the correct classification of COVID-19 disease severity when combined with relative neutrophil and lymphocyte counts. By analyzing PEA, clinical and hematochemical data with statistical methods that were able to handle many variables in the presence of a relatively small sample size, we identified nine potential serum biomarkers of a "severe" outcome. Most of these were confirmed by literature data. Importantly, we found three biomarkers associated with central nervous system pathologies and protective factors, which were downregulated in the most severe cases.

Keywords: COVID-19; cardiometabolic; inflammation; neurologic disease; proximity extension assay.

MeSH terms

  • Biomarkers / blood
  • COVID-19* / diagnosis
  • Humans
  • Lymphocyte Count
  • Machine Learning
  • Proteomics*

Substances

  • Biomarkers