Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

Wei Lin; Songchang Shi; Huiyu Lan; Nengying Wang; Huibin Huang; Junping Wen; Gang Chen

doi:10.1007/s12020-023-03536-y

Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort

Endocrine. 2024 Mar;83(3):604-614. doi: 10.1007/s12020-023-03536-y. Epub 2023 Sep 30.

Authors

Wei Lin^#¹, Songchang Shi^#², Huiyu Lan³, Nengying Wang³, Huibin Huang³, Junping Wen³, Gang Chen⁴

Affiliations

¹ Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China. caolalin0929@163.com.
² Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Hospital Jinshan Branch, Fujian Provincial Hospital, Fuzhou, 350001, PR China.
³ Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China.
⁴ Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China. chengangfj@163.com.

^# Contributed equally.

PMID: 37776483
DOI: 10.1007/s12020-023-03536-y

Abstract

Background: The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability.

Methods: This study employed nine commonly used machine learning methods to construct overweight risk models. The general community are the target of this study, and a total of 10,905 Chinese subjects from Ningde City in Fujian province, southeast China, participated. The best model was selected through appropriate verification and validation and was suitably explained.

Results: The overweight risk models employing machine learning exhibited good performance. It was concluded that CatBoost, which is used in the construction of clinical risk models, may surpass previous machine learning methods. The visual display of the Shapley additive explanation value for the machine model variables accurately represented the influence of each variable in the model.

Conclusions: The construction of an overweight risk model using machine learning may currently be the best approach. Moreover, CatBoost may be the best machine learning method. Furthermore, combining Shapley's additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.

Keywords: Interpretable; Machine learning; Overweight; Prediction model; Risk.

MeSH terms

China / epidemiology
East Asian People
Humans
Machine Learning*
Overweight* / epidemiology
Retrospective Studies
Risk Factors

Abstract

MeSH terms

Grants and funding