Evaluation of Statistical Approaches in Developing a Predictive Model of Severe COVID-19 during Early Phase of Pandemic with Limited Data Resources

Tohoku J Exp Med. 2024 Jan 30;262(1):33-41. doi: 10.1620/tjem.2023.J090. Epub 2023 Nov 2.

Abstract

As evidence of risk factors for severe cases of coronavirus disease 2019 (COVID-19) was uncertain in early phases of the pandemic, the development of an efficient predictive model for severe cases to triage high-risk individuals represented an urgent yet challenging issue. It is crucial to select appropriate statistical models when available data and evidence are limited. This study was conducted to assess the accuracy of different statistical models in predicting severe cases using demographic data from patients with COVID-19 prior to the emergence of consequential variants. We analyzed data from 929 consecutive patients diagnosed with COVID-19 prior to March 2021, including their age, sex, body mass index, and past medical histories, and compared areas under the receiver operating characteristic curve (ROC AUC) between different statistical models. The random forest (RF) model, deep learning (DL) models with not too many neurons, and naïve Bayes model exhibited AUC measures of > 0.70 with the validation datasets. The naïve Bayes model performed the best with the AUC measures of > 0.80. The accuracies in RF were more robust with narrower distribution of AUC measures compared to those in DL. The benefit of performing feature selection with a training dataset before building models was seen in some models, but not in all models. In summary, the naïve Bayes and RF models exhibited ideal predictive performance even with limited available data. The benefit of performing feature selection before building models with limited data resources depended on machine learning methods and parameters.

Keywords: coronavirus disease 2019 (COVID-19); deep learning; naïve Bayes; neural network; random forest.

MeSH terms

  • Bayes Theorem
  • Body Mass Index
  • COVID-19* / epidemiology
  • Humans
  • Neurons
  • Pandemics*