Detection of Depression-Related Tweets in Mexico Using Crosslingual Schemes and Knowledge Distillation

Jorge Pool-Cen; Hugo Carlos-Martínez; Gandhi Hernández-Chan; Oscar Sánchez-Siordia

doi:10.3390/healthcare11071057

Detection of Depression-Related Tweets in Mexico Using Crosslingual Schemes and Knowledge Distillation

Healthcare (Basel). 2023 Apr 6;11(7):1057. doi: 10.3390/healthcare11071057.

Authors

Jorge Pool-Cen¹, Hugo Carlos-Martínez^{1

2

3}, Gandhi Hernández-Chan^{1

2

3}, Oscar Sánchez-Siordia^{1

3}

Affiliations

¹ Geospatial Information Sciences Research Center, Mexico City 14240, Mexico.
² IxM CONACyT, Mexico City 14240, Mexico.
³ Laboratorio Nacional de Geointeligencia (GeoInt), Mexico City 14240, Mexico.

Abstract

Mental health problems are one of the various ills that afflict the world's population. Early diagnosis and medical care are public health problems addressed from various perspectives. Among the mental illnesses that most afflict the population is depression; its early diagnosis is vitally important, as it can trigger more severe illnesses, such as suicidal ideation. Due to the lack of homogeneity in current diagnostic tools, the community has focused on using AI tools for opportune diagnosis. Unfortunately, there is a lack of data that allows the use of IA tools for the Spanish language. Our work has a cross-lingual scheme to address this issue, allowing us to identify Spanish and English texts. The experiments demonstrated the methodology's effectiveness with an F1-score of 0.95. With this methodology, we propose a method to solve a classification problem for depression tweets (or short texts) by reusing English language databases with insufficient data to generate a classification model, such as in the Spanish language. We also validated the information obtained with public data to analyze the behavior of depression in Mexico during the COVID-19 pandemic. Our results show that the use of these methodologies can serve as support, not only in the diagnosis of depression, but also in the construction of different language databases that allow the creation of more efficient diagnostic tools.

Keywords: COVID-19; Twitter; depression; dimensionality reduction; knowledge distillation; text classification.

Grants and funding

This research received no external funding.