Identification of critical SARS-CoV-2 amino acids associated with COVID-19 hospitalization rate using machine learning and statistical modeling: An observational study in the United States

Infect Genet Evol. 2023 Sep:113:105480. doi: 10.1016/j.meegid.2023.105480. Epub 2023 Jul 10.

Abstract

Background: The COVID-19 pandemic has put many medical systems on the verge of collapse in the last two years. Virus mutation was one of the important factors affecting the COVID-19 infection severity and hospitalizations. Although over ten thousand SARS-CoV-2 mutations being reported since the beginning of the COVID-19 pandemic, only a small percentage of mutations are likely to affect the virus phenotype and change its severity. Finding out which amino acids have the greatest impact on COVID-19 hospitalization rate is an important research question.

Methods: This observational study used the COVID-19 case hospitalization ratio (CHR) to represent the virus severity related with hospitalization. The database is based on the daily state-level epidemiological and genomic sequential data in the United States from the Alpha wave to the first Omicron wave. The critical amino acids that mostly affected the CHR were determined by using four types of models including extreme gradient boosting decision trees (XGBoost), artificial neural networks (ANNs), logistic regression and Lasso regression models.

Results: The XGBoost, ANN, logistic regression, and Lasso regression models all produce excellent results (mean square error for all state-level models does not exceed 0.0008 using the testing dataset). Based on the rank of importance of all covariates, the critical amino acids most affecting the CHR were identified, including T19, L24, P25, P26, A27, A67, H69, V70, T95, G142, V143, Y145, E156, F157, N211, L212, V213, R214, D215, G339, R346, S373, L452, S477, T478, E484, N501, A570, P681, and T716.

Conclusion: This study identified critical amino acids that are most likely to affect the hospitalization rate, allowing public health workers to monitor these highly risky amino acids and raise an alarm immediately when more severe mutations occur. Furthermore, the methodology and results may be extended to other regions.

Keywords: COVID-19; Case hospitalization ratio; SARS-CoV-2 amino acid mutation.

Publication types

  • Observational Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids
  • COVID-19* / epidemiology
  • Hospitalization
  • Humans
  • Machine Learning
  • Pandemics
  • SARS-CoV-2* / genetics
  • United States / epidemiology

Substances

  • Amino Acids