Application and interpretation of machine learning models in predicting the risk of severe obstructive sleep apnea in adults

Yewen Shi; Yitong Zhang; Zine Cao; Lina Ma; Yuqi Yuan; Xiaoxin Niu; Yonglong Su; Yushan Xie; Xi Chen; Liang Xing; Xinhong Hei; Haiqin Liu; Shinan Wu; Wenle Li; Xiaoyong Ren

doi:10.1186/s12911-023-02331-z

Application and interpretation of machine learning models in predicting the risk of severe obstructive sleep apnea in adults

BMC Med Inform Decis Mak. 2023 Oct 19;23(1):230. doi: 10.1186/s12911-023-02331-z.

Authors

Yewen Shi^#¹, Yitong Zhang^#¹, Zine Cao^#¹, Lina Ma¹, Yuqi Yuan¹, Xiaoxin Niu¹, Yonglong Su¹, Yushan Xie¹, Xi Chen¹, Liang Xing¹, Xinhong Hei², Haiqin Liu¹, Shinan Wu³, Wenle Li⁴, Xiaoyong Ren⁵

Affiliations

¹ Department of Otorhinolaryngology Head and Neck Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, NO. 157 Xi Wu Road, Xi'an, Shaan'xi Province, China.
² School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaan'xi Province, China.
³ School of Medicine, Eye Institute of Xiamen University, Xiamen University, Xiamen, Fujian Province, China. wshinana99@163.com.
⁴ Molecular Imaging and Translational Medicine Research Center, State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, Xiamen University, Xiamen, Fujian Province, China. drlee0910@163.com.
⁵ Department of Otorhinolaryngology Head and Neck Surgery, The Second Affiliated Hospital of Xi'an Jiaotong University, NO. 157 Xi Wu Road, Xi'an, Shaan'xi Province, China. cor_renxiaoyong@126.com.

^# Contributed equally.

Abstract

Background: Obstructive sleep apnea (OSA) is a globally prevalent disease with a complex diagnostic method. Severe OSA is associated with multi-system dysfunction. We aimed to develop an interpretable machine learning (ML) model for predicting the risk of severe OSA and analyzing the risk factors based on clinical characteristics and questionnaires.

Methods: This was a retrospective study comprising 1656 subjects who presented and underwent polysomnography (PSG) between 2018 and 2021. A total of 23 variables were included, and after univariate analysis, 15 variables were selected for further preprocessing. Six types of classification models were used to evaluate the ability to predict severe OSA, namely logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), bootstrapped aggregating (Bagging), and multilayer perceptron (MLP). All models used the area under the receiver operating characteristic curve (AUC) was calculated as the performance metric. We also drew SHapley Additive exPlanations (SHAP) plots to interpret predictive results and to analyze the relative importance of risk factors. An online calculator was developed to estimate the risk of severe OSA in individuals.

Results: Among the enrolled subjects, 61.47% (1018/1656) were diagnosed with severe OSA. Multivariate LR analysis showed that 10 of 23 variables were independent risk factors for severe OSA. The GBM model showed the best performance (AUC = 0.857, accuracy = 0.766, sensitivity = 0.798, specificity = 0.734). An online calculator was developed to estimate the risk of severe OSA based on the GBM model. Finally, waist circumference, neck circumference, the Epworth Sleepiness Scale, age, and the Berlin questionnaire were revealed by the SHAP plot as the top five critical variables contributing to the diagnosis of severe OSA. Additionally, two typical cases were analyzed to interpret the contribution of each variable to the outcome prediction in a single patient.

Conclusions: We established six risk prediction models for severe OSA using ML algorithms. Among them, the GBM model performed best. The model facilitates individualized assessment and further clinical strategies for patients with suspected severe OSA. This will help to identify patients with severe OSA as early as possible and ensure their timely treatment.

Trial registration: Retrospectively registered.

Keywords: Gradient boosting machine; Machine learning; Obstructive sleep apnea; Prediction model; Risk factor; Shapley additive explanations.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Humans
Machine Learning
ROC Curve
Retrospective Studies
Risk Factors
Sleep Apnea, Obstructive* / diagnosis
Sleep Apnea, Obstructive* / epidemiology