Prediction modelling of COVID using machine learning methods from B-cell dataset

Results Phys. 2021 Feb:21:103813. doi: 10.1016/j.rinp.2021.103813. Epub 2021 Jan 17.

Abstract

Coronavirus is a pandemic that has become a concern for the whole world. This disease has stepped out to its greatest extent and is expanding day by day. Coronavirus, termed as a worldwide disease, has caused more than 8 lakh deaths worldwide. The foremost cause of the spread of coronavirus is SARS-CoV and SARS-CoV-2, which are part of the coronavirus family. Thus, predicting the patients suffering from such pandemic diseases would help to formulate the difference in inaccurate and infeasible time duration. This paper mainly focuses on the prediction of SARS-CoV and SARS-CoV-2 using the B-cells dataset. The paper also proposes different ensemble learning strategies that came out to be beneficial while making predictions. The predictions are made using various machine learning models. The numerous machine learning models, such as SVM, Naïve Bayes, K-nearest neighbors, AdaBoost, Gradient boosting, XGBoost, Random forest, ensembles, and neural networks are used in predicting and analyzing the dataset. The most accurate result was obtained using the proposed algorithm with 0.919 AUC score and 87.248% validation accuracy for predicting SARS-CoV and 0.923 AUC and 87.7934% validation accuracy for predicting SARS-CoV-2 virus.

Keywords: AdaBoost; B-cells; COVD-19; Coronavirus; Ensembles; Gradient boosting; K – nearest neighbors (KNN); Logistic regression; Multilayer perceptron (MLP); Naïve Bayes; Random forest; SARS-CoV; SARS-CoV-2; Support vector machine (SVM); XGBoost.