Identifying self-disclosed anxiety on Twitter: A natural language processing approach

Daniel Zarate; Michelle Ball; Maria Prokofieva; Vassilis Kostakos; Vasileios Stavropoulos

doi:10.1016/j.psychres.2023.115579

Identifying self-disclosed anxiety on Twitter: A natural language processing approach

Psychiatry Res. 2023 Dec:330:115579. doi: 10.1016/j.psychres.2023.115579. Epub 2023 Nov 3.

Authors

Daniel Zarate¹, Michelle Ball², Maria Prokofieva², Vassilis Kostakos³, Vasileios Stavropoulos⁴

Affiliations

¹ College of Health and Biomedicine, Royal Melbourne Institute of Technology (RMIT), Australia. Electronic address: Daniel.zarate@live.vu.edu.au.
² Institute for Health and Sport, Victoria University, Melbourne, Australia.
³ University of Melbourne, Melbourne, Australia.
⁴ College of Health and Biomedicine, Royal Melbourne Institute of Technology (RMIT), Australia; Department of Psychology, University of Athens, Athens, Greece.

PMID: 37956589
DOI: 10.1016/j.psychres.2023.115579

Abstract

Background: Text analyses of social media posts are a promising source of mental health information. This study used natural language processing to explore distinct language patterns on Twitter related to self-reported anxiety diagnosis.

Methods: A total of 233.000 tweets made by 605 users (300 reporting anxiety diagnosis and 305 not) over six months were comparatively analysed, considering user behavior, Linguistic Inquiry Word Count (LIWC), and sentiment analysis. Twitter users with a self-disclosed diagnosis of anxiety were classified as 'anxious' to facilitate group comparisons.

Results: Supervised machine learning models showed a high prediction accuracy (Naïve Bayes 81.1 %, Random Forests 79.8 %, and LASSO-regression 79.4 %) in identifying Twitter users' self-disclosed diagnosis of anxiety. Additionally, a Latent Profile Analysis (LPA) identified four profiles characterized by high sentiment (31 % anxious participants), low sentiment (68 % anxious), self-immersed (80 % anxious), and normative behavior (38 % anxious).

Conclusion: The digital footprint of self-disclosed anxiety on Twitter posts presented a high frequency of words conveying either negative sentiment, a low frequency of positive sentiment, a reduced frequency of posting, and lengthier texts. These distinct patterns enabled highly accurate prediction of anxiety diagnosis. On this basis, appropriately resourced, awareness raising, online mental health campaigns are advocated.

Keywords: Anxiety; Cyber-phenotype; Digital footprint; Natural language processing; Sentiment analysis; Twitter.

MeSH terms

Anxiety / diagnosis
Anxiety Disorders
Bayes Theorem
Humans
Natural Language Processing*
Social Media*