Using CatBoost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011-2018

Psychiatry Res. 2021 Dec:306:114261. doi: 10.1016/j.psychres.2021.114261. Epub 2021 Nov 1.

Abstract

Depression is one of the most common mental health problems in middle-aged and elderly people. The establishment of risk factor-based depression risk assessment model is conducive to early detection and early treatment of high-risk groups of depression. Five machine learning models (logistic regression (LR); back propagation (BP); random forest (RF); support vector machines (SVM); category boosting (CatBoost) were used to evaluate the depression among 8374 middle-aged people and 4636 elderly people in the NHANES database from 2011 to 2018. In the 2011-2018 cycle, the estimated prevalence of depression was 8.97% in the middle-aged participants and 8.02% in the elderly participants. Among the middle-aged and elderly participants, CatBoost was the best model to identify depression, and its area under the working characteristic curve (AUC) reaches the highest. The second is LR model and SVM model, while the performance of BP and RF model was slightly worse. The primary influencing factor of depression in middle-aged male is alanine aminotransferase. All five machine learning models can identify the occurrence of depression in the NHANES data set through social demographics, lifestyle, laboratory data and other data of middle-aged and elderly people, and among five models, the CatBoost model performed best.

Keywords: Depression; Machine learning; Middle-aged and elderly; NHANES.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Algorithms
  • Depression* / diagnosis
  • Depression* / epidemiology
  • Humans
  • Logistic Models
  • Machine Learning*
  • Male
  • Middle Aged
  • Nutrition Surveys
  • Support Vector Machine