Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

JMIR Ment Health. 2016 May 16;3(2):e21. doi: 10.2196/mental.4822.

Abstract

Background: One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time.

Objective: Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population.

Methods: Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk.

Results: Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%).

Conclusions: Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data.

Keywords: machine learning; social media; suicide; twitter.