Machine Learning-Based Prediction of Suicidal Thinking in Adolescents by Derivation and Validation in 3 Independent Worldwide Cohorts: Algorithm Development and Validation Study

J Med Internet Res. 2024 May 17:26:e55913. doi: 10.2196/55913.

Abstract

Background: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods.

Objective: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML).

Methods: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway's University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking.

Results: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each.

Conclusions: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence.

Keywords: SHAP value; Shapley additive explanations; XGBoost; adolescent; machine learning; mental health; predictive model; risk behavior; suicidal thinking.

Publication types

  • Validation Study

MeSH terms

  • Adolescent
  • Adolescent Behavior / psychology
  • Algorithms
  • Cohort Studies
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Norway
  • Republic of Korea
  • Risk Factors
  • Risk-Taking
  • Suicidal Ideation*
  • Suicide / psychology
  • Suicide / statistics & numerical data
  • Surveys and Questionnaires