A deep multi-view imbalanced learning approach for identifying informative COVID-19 tweets from social media

Comput Biol Med. 2023 Sep:164:107232. doi: 10.1016/j.compbiomed.2023.107232. Epub 2023 Jul 8.

Abstract

Social media platforms such as Twitter are home ground for rapid COVID-19-related information sharing over the Internet, thereby becoming the favorable data resource for many downstream applications. Due to the massive pile of COVID-19 tweets generated every day, it is significant that the machine-learning-supported downstream applications can effectively skip the uninformative tweets and only pick up the informative tweets for their further use. However, existing solutions do not specifically consider the negative effect caused by the imbalanced ratios between informative and uninformative tweets in training data. In particular, most of the existing solutions are dominated by single-view learning, neglecting the rich information from different views to facilitate learning. In this study, a novel deep imbalanced multi-view learning approach called D-SVM-2K is proposed to identify the informative COVID-19 tweets from social media. This approach is built upon the well-known multiview learning method SVM-2K to incorporate different views generated from different feature extraction techniques. To battle against the class imbalance problem and enhance its learning ability, D-SVM-2K stacks multiple SVM-2K base classifiers in a stacked deep structure where its base classifiers can learn from either the original training dataset or the shifted critical regions identified using the well-known k-nearest neighboring algorithm. D-SVM-2K also realises a global and local deep ensemble learning on the multiple views' data. Our empirical experiments on a real-world labeled tweet dataset demonstrate the effectiveness of D-SVM-2K in dealing with the real-world multi-view class imbalance issues.

Keywords: Imbalanced learning; Multiview learning; Stacked architecture; Support vector machines; Tweets data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19*
  • Humans
  • Information Dissemination
  • Machine Learning
  • Social Media*