Prediction modelling of COVID using machine learning methods from B-cell dataset

Nikita Jain; Srishti Jhunthra; Harshit Garg; Vedika Gupta; Senthilkumar Mohan; Ali Ahmadian; Soheil Salahshour; Massimiliano Ferrara

doi:10.1016/j.rinp.2021.103813

Prediction modelling of COVID using machine learning methods from B-cell dataset

Results Phys. 2021 Feb:21:103813. doi: 10.1016/j.rinp.2021.103813. Epub 2021 Jan 17.

Authors

Nikita Jain¹, Srishti Jhunthra¹, Harshit Garg¹, Vedika Gupta¹, Senthilkumar Mohan², Ali Ahmadian^{3

4}, Soheil Salahshour⁵, Massimiliano Ferrara⁶

Affiliations

¹ Department of Computer Science & Engineering, Bharati Vidyapeeth's College of Engineering, 110063 New Delhi, India.
² School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India.
³ Institute of IR 4.0, The National University of Malaysia, Bangi 43600 UKM, Selangor, Malaysia.
⁴ School of Mathematical Sciences, College of Science and Technology, Wenzhou-Kean University, Wenzhou, China.
⁵ Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey.
⁶ ICRIOS - The Invernizzi Centre for Research in Innovation, Organization, Strategy and Entrepreneurship, Bocconi University - Department of Management and Technology, Via Sarfatti, 25Milano (MI) 20136, Italy.

Abstract

Coronavirus is a pandemic that has become a concern for the whole world. This disease has stepped out to its greatest extent and is expanding day by day. Coronavirus, termed as a worldwide disease, has caused more than 8 lakh deaths worldwide. The foremost cause of the spread of coronavirus is SARS-CoV and SARS-CoV-2, which are part of the coronavirus family. Thus, predicting the patients suffering from such pandemic diseases would help to formulate the difference in inaccurate and infeasible time duration. This paper mainly focuses on the prediction of SARS-CoV and SARS-CoV-2 using the B-cells dataset. The paper also proposes different ensemble learning strategies that came out to be beneficial while making predictions. The predictions are made using various machine learning models. The numerous machine learning models, such as SVM, Naïve Bayes, K-nearest neighbors, AdaBoost, Gradient boosting, XGBoost, Random forest, ensembles, and neural networks are used in predicting and analyzing the dataset. The most accurate result was obtained using the proposed algorithm with 0.919 AUC score and 87.248% validation accuracy for predicting SARS-CoV and 0.923 AUC and 87.7934% validation accuracy for predicting SARS-CoV-2 virus.

Keywords: AdaBoost; B-cells; COVD-19; Coronavirus; Ensembles; Gradient boosting; K – nearest neighbors (KNN); Logistic regression; Multilayer perceptron (MLP); Naïve Bayes; Random forest; SARS-CoV; SARS-CoV-2; Support vector machine (SVM); XGBoost.