Predicting Depression in Community Dwellers Using a Machine Learning Algorithm

Diagnostics (Basel). 2021 Aug 7;11(8):1429. doi: 10.3390/diagnostics11081429.

Abstract

Depression is one of the leading causes of disability worldwide. Given the socioeconomic burden of depression, appropriate depression screening for community dwellers is necessary. We used data from the 2014 and 2016 Korea National Health and Nutrition Examination Surveys. The 2014 dataset was used as a training set, whereas the 2016 dataset was used as the hold-out test set. The synthetic minority oversampling technique (SMOTE) was used to control for class imbalances between the depression and non-depression groups in the 2014 dataset. The least absolute shrinkage and selection operator (LASSO) was used for feature reduction and classifiers in the final model. Data obtained from 9488 participants were used for the machine learning process. The depression group had poorer socioeconomic, health, functional, and biological measures than the non-depression group. From the initial 37 variables, 13 were selected using LASSO. All performance measures were calculated based on the raw 2016 dataset without the SMOTE. The area under the receiver operating characteristic curve and overall accuracy in the hold-out test set were 0.903 and 0.828, respectively. Perceived stress had the strongest influence on the classifying model for depression. LASSO can be practically applied for depression screening of community dwellers with a few variables. Future studies are needed to develop a more efficient and accurate classification model for depression.

Keywords: LASSO; depression; logistic regression; machine learning; mental health.