Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

Chandreen Liyanage; Muskan Garg; Vijay Mago; Sunghwan Sohn

doi:10.18653/v1/2023.bionlp-1.27

Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

Proc Conf Assoc Comput Linguist Meet. 2023 Jul:2023:306-312. doi: 10.18653/v1/2023.bionlp-1.27.

Authors

Chandreen Liyanage¹, Muskan Garg², Vijay Mago¹, Sunghwan Sohn²

Affiliations

¹ Lakehead University, Thunder Bay, ON P7B 5E1, Canada.
² Mayo Clinic, Rochester, MN 55901, USA.

PMID: 38384674
PMCID: PMC10878427 (available on 2024-07-01)
DOI: 10.18653/v1/2023.bionlp-1.27

Abstract

Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the pre-screening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.

Grants and funding

R01 AG068007/AG/NIA NIH HHS/United States