A Novel Hybrid Multi-Modal Deep Learning for Detecting Hashtag Incongruity on Social Media

Sajad Dadgar; Mehdi Neshat

doi:10.3390/s22249870

A Novel Hybrid Multi-Modal Deep Learning for Detecting Hashtag Incongruity on Social Media

Sensors (Basel). 2022 Dec 15;22(24):9870. doi: 10.3390/s22249870.

Authors

Sajad Dadgar¹, Mehdi Neshat^{2

3}

Affiliations

¹ Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran 15875-4413, Iran.
² Adjunct Research Fellow at Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Brisbane, QLD 4006, Australia.
³ Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW 2007, Australia.

Abstract

Hashtags have been an integral element of social media platforms over the years and are widely used by users to promote, organize and connect users. Despite the intensive use of hashtags, there is no basis for using congruous tags, which causes the creation of many unrelated contents in hashtag searches. The presence of mismatched content in the hashtag creates many problems for individuals and brands. Although several methods have been presented to solve the problem by recommending hashtags based on the users' interest, the detection and analysis of the characteristics of these repetitive contents with irrelevant hashtags have rarely been addressed. To this end, we propose a novel hybrid deep learning hashtag incongruity detection by fusing visual and textual modality. We fine-tune BERT and ResNet50 pre-trained models to encode textual and visual information to encode textual and visual data simultaneously. We further attempt to show the capability of logo detection and face recognition in discriminating images. To extract faces, we introduce a pipeline that ranks faces based on the number of times they appear on Instagram accounts using face clustering. Moreover, we conduct our analysis and experiments on a dataset of Instagram posts that we collect from hashtags related to brands and celebrities. Unlike the existing works, we analyze these contents from both content and user perspectives and show a significant difference between data. In light of our results, we show that our multimodal model outperforms other models and the effectiveness of object detection in detecting mismatched information.

Keywords: XGBoost; fine–tuning; hashtags; hybrid deep learning models; image–text multimodal classification; machine learning models; object detection; social media analysis; stacking ensemble.

MeSH terms

Deep Learning*
Humans
Social Media*

Grants and funding

The authors received no financial support for the research, authorship and/or publication of this article.