SCLAVOEM: hyper parameter optimization approach to predictive modelling of COVID-19 infodemic tweets using smote and classifier vote ensemble

Taiwo Olaleye; Adebayo Abayomi-Alli; Kayode Adesemowo; Oluwasefunmi Tale Arogundade; Sanjay Misra; Utku Kose

doi:10.1007/s00500-022-06940-0

SCLAVOEM: hyper parameter optimization approach to predictive modelling of COVID-19 infodemic tweets using smote and classifier vote ensemble

Soft comput. 2023;27(6):3531-3550. doi: 10.1007/s00500-022-06940-0. Epub 2022 Mar 15.

Authors

Taiwo Olaleye¹, Adebayo Abayomi-Alli², Kayode Adesemowo³, Oluwasefunmi Tale Arogundade², Sanjay Misra⁴, Utku Kose⁵

Affiliations

¹ Computer Centre and Services, Federal College of Education, Abeokuta, Nigeria.
² Department of Computer Science, Federal University of Agriculture, Abeokuta, Nigeria.
³ Nelson Mandela University, Port Elizabeth, South Africa.
⁴ Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria.
⁵ Department of Computer Engineering, Suleyman Demirel University, Isparta, Turkey.

Abstract

Fake COVID-19 tweets are dangerous since they are misinformative, completely inaccurate, as threatening the efforts for flattening the pandemic curve. Thus, aside the COVID-19 pandemic, dealing with fake news and myths about the virus constitute an infodemic issue, which must be tackled by ensuring only valid information. In this context, this study proposed the Synthetic Minority Over-Sampling Technique (SMOTE) and the classifier vote ensemble (SCLAVOEM) method as a fake news classifier and a hyper parameter optimization approach for predictive modelling of COVID-19 infodemic tweets. Hyper parameter optimization variables were deployed across specific points of the proposed model and a minority oversampling of training sets was applied within imbalanced class representations. Experimental applications by the SCLAVOEM for COVID-19 infodemic prediction returned 0.999 and 1.000 weighted averages for F-measure and area under curve (AUC), respectively. Thanks to the SMOTE, the performance increases of 3.74 and 1.11%; 5.05 and 0.29%; 4.59 and 8.05% was seen in three different data sets. Eventually, the SCLAVOEM provided a framework for predictive detecting 'fake tweets' and three classifiers: 'positive', 'negative' and 'click-trap' (piège à clics). It is thought that the model will automatically flag fake information on Twitter, hence protecting the public from inaccurate and information overload.

Keywords: Bag-of-words; COVID-19; Ensemble machine learning; Fake news; Infodemic; Parameter optimization; Tweet; Twitter.