An aspect-level sentiment analysis dataset for therapies on Twitter

Yuting Guo; Sudeshna Das; Sahithi Lakamana; Abeed Sarker

doi:10.1016/j.dib.2023.109618

An aspect-level sentiment analysis dataset for therapies on Twitter

Data Brief. 2023 Sep 23:50:109618. doi: 10.1016/j.dib.2023.109618. eCollection 2023 Oct.

Authors

Yuting Guo¹, Sudeshna Das¹, Sahithi Lakamana¹, Abeed Sarker¹

Affiliation

¹ Emory University, Atlanta, Georgia 30322, USA.

Abstract

The dataset described is an aspect-level sentiment analysis dataset for therapies, including medication, behavioral and other therapies, created by leveraging user-generated text from Twitter. The dataset was constructed by collecting Twitter posts using keywords associated with the therapies (often referred to as treatments). Subsequently, subsets of the collected posts were manually reviewed, and annotation guidelines were developed to categorize the posts as positive, negative, or neutral. The dataset contains a total of 5364 posts mentioning 32 therapies. These posts are further categorized manually into 998 (18.6%) positive, 619 (11.5%) negatives, and 3747 (69.9%) neutral sentiments. The inter-annotation agreement for the dataset was evaluated using Cohen's Kappa score, achieving an 0.82 score. The potential use of this dataset lies in the development of automatic systems that can detect users' sentiments toward therapies based on their posts. While there are other sentiment analysis datasets available, this is the first that encodes sentiments associated with specific therapies. Researchers and developers can utilize this dataset to train sentiment analysis models, natural language processing algorithms, or machine learning systems to accurately identify and analyze the sentiments expressed by consumers on social media platforms like Twitter.

Keywords: Biomedical informatics; Machine learning; Natural language processing; Sentiment analysis; Text classification; Therapy.

Grants and funding

R01 DA057599/DA/NIDA NIH HHS/United States