Association between biochemical and hematologic factors with COVID-19 using data mining methods

BMC Infect Dis. 2023 Dec 21;23(1):897. doi: 10.1186/s12879-023-08676-0.

Abstract

Background and aim: Coronavirus disease (COVID-19) is an infectious disease that can spread very rapidly with important public health impacts. The prediction of the important factors related to the patient's infectious diseases is helpful to health care workers. The aim of this research was to select the critical feature of the relationship between demographic, biochemical, and hematological characteristics, in patients with and without COVID-19 infection.

Method: A total of 13,170 participants in the age range of 35-65 years were recruited. Decision Tree (DT), Logistic Regression (LR), and Bootstrap Forest (BF) techniques were fitted into data. Three models were considered in this study, in model I, the biochemical features, in model II, the hematological features, and in model II, both biochemical and homological features were studied.

Results: In Model I, the BF, DT, and LR algorithms identified creatine phosphokinase (CPK), blood urea nitrogen (BUN), fasting blood glucose (FBG), total bilirubin, body mass index (BMI), sex, and age, as important predictors for COVID-19. In Model II, our BF, DT, and LR algorithms identified BMI, sex, mean platelet volume (MPV), and age as important predictors. In Model III, our BF, DT, and LR algorithms identified CPK, BMI, MPV, BUN, FBG, sex, creatinine (Cr), age, and total bilirubin as important predictors.

Conclusion: The proposed BF, DT, and LR models appear to be able to predict and classify infected and non-infected people based on CPK, BUN, BMI, MPV, FBG, Sex, Cr, and Age which had a high association with COVID-19.

Keywords: Biochemical; COVID-19; Data mining; Decision trees; Hematologic; SARS-COV-2.

MeSH terms

  • Adult
  • Aged
  • Algorithms
  • Bilirubin
  • COVID-19*
  • Data Mining / methods
  • Humans
  • Middle Aged
  • SARS-CoV-2

Substances

  • Bilirubin