Exploring COVID-related relationship extraction: Contrasting data sources and analyzing misinformation

Tanvi Sharma; Amer Farea; Nadeesha Perera; Frank Emmert-Streib

doi:10.1016/j.heliyon.2024.e26973

Exploring COVID-related relationship extraction: Contrasting data sources and analyzing misinformation

Heliyon. 2024 Feb 28;10(5):e26973. doi: 10.1016/j.heliyon.2024.e26973. eCollection 2024 Mar 15.

Authors

Tanvi Sharma¹, Amer Farea¹, Nadeesha Perera¹, Frank Emmert-Streib¹

Affiliation

¹ Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.

Abstract

The COVID-19 pandemic presented an unparalleled challenge to global healthcare systems. A central issue revolves around the urgent need to swiftly amass critical biological and medical knowledge concerning the disease, its treatment, and containment. Remarkably, text data remains an underutilized resource in this context. In this paper, we delve into the extraction of COVID-related relations using transformer-based language models, including Bidirectional Encoder Representations from Transformers (BERT) and DistilBERT. Our analysis scrutinizes the performance of five language models, comparing information from both PubMed and Reddit, and assessing their ability to make novel predictions, including the detection of "misinformation." Key findings reveal that, despite inherent differences, both PubMed and Reddit data contain remarkably similar information, suggesting that Reddit can serve as a valuable resource for rapidly acquiring information during times of crisis. Furthermore, our results demonstrate that language models can unveil previously unseen entities and relations, a crucial aspect in identifying instances of misinformation.

Keywords: Artificial intelligence; Data science; Deep learning; Misinformation; Natural language processing; Public health; Relation extraction.