Application and interpretation of machine learning models in predicting the risk of severe obstructive sleep apnea in adults

BMC Med Inform Decis Mak. 2023 Oct 19;23(1):230. doi: 10.1186/s12911-023-02331-z.

Abstract

Background: Obstructive sleep apnea (OSA) is a globally prevalent disease with a complex diagnostic method. Severe OSA is associated with multi-system dysfunction. We aimed to develop an interpretable machine learning (ML) model for predicting the risk of severe OSA and analyzing the risk factors based on clinical characteristics and questionnaires.

Methods: This was a retrospective study comprising 1656 subjects who presented and underwent polysomnography (PSG) between 2018 and 2021. A total of 23 variables were included, and after univariate analysis, 15 variables were selected for further preprocessing. Six types of classification models were used to evaluate the ability to predict severe OSA, namely logistic regression (LR), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), bootstrapped aggregating (Bagging), and multilayer perceptron (MLP). All models used the area under the receiver operating characteristic curve (AUC) was calculated as the performance metric. We also drew SHapley Additive exPlanations (SHAP) plots to interpret predictive results and to analyze the relative importance of risk factors. An online calculator was developed to estimate the risk of severe OSA in individuals.

Results: Among the enrolled subjects, 61.47% (1018/1656) were diagnosed with severe OSA. Multivariate LR analysis showed that 10 of 23 variables were independent risk factors for severe OSA. The GBM model showed the best performance (AUC = 0.857, accuracy = 0.766, sensitivity = 0.798, specificity = 0.734). An online calculator was developed to estimate the risk of severe OSA based on the GBM model. Finally, waist circumference, neck circumference, the Epworth Sleepiness Scale, age, and the Berlin questionnaire were revealed by the SHAP plot as the top five critical variables contributing to the diagnosis of severe OSA. Additionally, two typical cases were analyzed to interpret the contribution of each variable to the outcome prediction in a single patient.

Conclusions: We established six risk prediction models for severe OSA using ML algorithms. Among them, the GBM model performed best. The model facilitates individualized assessment and further clinical strategies for patients with suspected severe OSA. This will help to identify patients with severe OSA as early as possible and ensure their timely treatment.

Trial registration: Retrospectively registered.

Keywords: Gradient boosting machine; Machine learning; Obstructive sleep apnea; Prediction model; Risk factor; Shapley additive explanations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Humans
  • Machine Learning
  • ROC Curve
  • Retrospective Studies
  • Risk Factors
  • Sleep Apnea, Obstructive* / diagnosis
  • Sleep Apnea, Obstructive* / epidemiology