Predicting Depression in Community Dwellers Using a Machine Learning Algorithm

Seo-Eun Cho; Zong Woo Geem; Kyoung-Sae Na

doi:10.3390/diagnostics11081429

Predicting Depression in Community Dwellers Using a Machine Learning Algorithm

Diagnostics (Basel). 2021 Aug 7;11(8):1429. doi: 10.3390/diagnostics11081429.

Authors

Seo-Eun Cho¹, Zong Woo Geem², Kyoung-Sae Na¹

Affiliations

¹ Department of Psychiatry, Gachon University College of Medicine, Gil Medical Center, Incheon 21565, Korea.
² College of IT Convergence, Gachon University, Seongnam 13120, Korea.

Abstract

Depression is one of the leading causes of disability worldwide. Given the socioeconomic burden of depression, appropriate depression screening for community dwellers is necessary. We used data from the 2014 and 2016 Korea National Health and Nutrition Examination Surveys. The 2014 dataset was used as a training set, whereas the 2016 dataset was used as the hold-out test set. The synthetic minority oversampling technique (SMOTE) was used to control for class imbalances between the depression and non-depression groups in the 2014 dataset. The least absolute shrinkage and selection operator (LASSO) was used for feature reduction and classifiers in the final model. Data obtained from 9488 participants were used for the machine learning process. The depression group had poorer socioeconomic, health, functional, and biological measures than the non-depression group. From the initial 37 variables, 13 were selected using LASSO. All performance measures were calculated based on the raw 2016 dataset without the SMOTE. The area under the receiver operating characteristic curve and overall accuracy in the hold-out test set were 0.903 and 0.828, respectively. Perceived stress had the strongest influence on the classifying model for depression. LASSO can be practically applied for depression screening of community dwellers with a few variables. Future studies are needed to develop a more efficient and accurate classification model for depression.

Keywords: LASSO; depression; logistic regression; machine learning; mental health.

Abstract

Grants and funding