SCLAVOEM: hyper parameter optimization approach to predictive modelling of COVID-19 infodemic tweets using smote and classifier vote ensemble

Soft comput. 2023;27(6):3531-3550. doi: 10.1007/s00500-022-06940-0. Epub 2022 Mar 15.

Abstract

Fake COVID-19 tweets are dangerous since they are misinformative, completely inaccurate, as threatening the efforts for flattening the pandemic curve. Thus, aside the COVID-19 pandemic, dealing with fake news and myths about the virus constitute an infodemic issue, which must be tackled by ensuring only valid information. In this context, this study proposed the Synthetic Minority Over-Sampling Technique (SMOTE) and the classifier vote ensemble (SCLAVOEM) method as a fake news classifier and a hyper parameter optimization approach for predictive modelling of COVID-19 infodemic tweets. Hyper parameter optimization variables were deployed across specific points of the proposed model and a minority oversampling of training sets was applied within imbalanced class representations. Experimental applications by the SCLAVOEM for COVID-19 infodemic prediction returned 0.999 and 1.000 weighted averages for F-measure and area under curve (AUC), respectively. Thanks to the SMOTE, the performance increases of 3.74 and 1.11%; 5.05 and 0.29%; 4.59 and 8.05% was seen in three different data sets. Eventually, the SCLAVOEM provided a framework for predictive detecting 'fake tweets' and three classifiers: 'positive', 'negative' and 'click-trap' (piège à clics). It is thought that the model will automatically flag fake information on Twitter, hence protecting the public from inaccurate and information overload.

Keywords: Bag-of-words; COVID-19; Ensemble machine learning; Fake news; Infodemic; Parameter optimization; Tweet; Twitter.