Vaccine hesitancy in the post-vaccination COVID-19 era: a machine learning and statistical analysis driven study

Evol Intell. 2023;16(3):739-757. doi: 10.1007/s12065-022-00704-3. Epub 2022 Mar 9.

Abstract

Background The COVID-19 pandemic has badly affected people of all ages globally. Therefore, its vaccine has been developed and made available for public use in unprecedented times. However, because of various levels of hesitancy, it did not have general acceptance. The main objective of this work is to identify the risk associated with the COVID-19 vaccines by developing a prognosis tool that will help in enhancing its acceptability and therefore, reducing the lethality of SARS-CoV-2. Methods: The obtained raw VAERS dataset has three files indicating medical history, vaccination status, and post vaccination symptoms respectively with more than 354 thousand samples. After pre-processing, this raw dataset has been merged into one with 85 different attributes however, the whole analysis has been subdivided into three scenarios ((i) medical history (ii) reaction of vaccination (iii) combination of both). Further, Machine Learning (ML) models which includes Linear Regression (LR), Random Forest (RF), Naive Bayes (NB), Light Gradient Boosting Algorithm (LGBM), and Multilayer feed-forward perceptron (MLP) have been employed to predict the most probable outcome and their performance has been evaluated based on various performance parameters. Also, the chi-square (statistical), LR, RF, and LGBM have been utilized to estimate the most probable attribute in the dataset that resulted in death, hospitalization, and COVID-19. Results: For the above mentioned scenarios, all the models estimates different attributes (such as cardiac arrest, Cancer, Hyperlipidemia, Kidney Disease, Diabetes, Atrial Fibrillation, Dementia, Thyroid, etc.) for death, hospitalization, and COVID-19 even after vaccination. Further, for prediction, LGBM outperforms all the other developed models in most of the scenarios whereas, LR, RF, NB, and MLP perform satisfactorily in patches. Conclusion: The male population in the age group of 50-70 has been found most susceptible to this virus. Also, people with existing serious illnesses have been found most vulnerable. Therefore, they must be vaccinated in close observations. Generally, no serious adverse effect of the vaccine has been observed therefore, people must vaccinate themselves without any hesitation at the earliest. Also, the model developed using LGBM establishes its supremacy over all the other prediction models. Therefore, it can be very helpful for the policymakers in administrating and prioritizing the population for the different vaccination programs.

Keywords: COVID-19; Machine Learning; Predictive Analysis; SARS-CoV-2; Statistical Analysis; VAERS.