[Application of machine learning model based on XGBoost algorithm in early prediction of patients with acute severe pancreatitis]

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Apr;35(4):421-426. doi: 10.3760/cma.j.cn121430-20221019-00930.
[Article in Chinese]

Abstract

Objective: To establish a machine learning model based on extreme gradient boosting (XGBoost) algorithm for early prediction of severe acute pancreatitis (SAP), and explore its predictive efficiency.

Methods: A retrospective cohort study was conducted. The patients with acute pancreatitis (AP) who admitted to the First Affiliated Hospital of Soochow University, the Second Affiliated Hospital of Soochow University and Changshu Hospital Affiliated to Soochow University from January 1, 2020 to December 31, 2021 were enrolled. Demography information, etiology, past history, and clinical indicators and imaging data within 48 hours of admission were collected according to the medical record system and image system, and the modified CT severity index (MCTSI), Ranson score, bedside index for severity in acute pancreatitis (BISAP) and acute pancreatitis risk score (SABP) were calculated. The data sets of the First Affiliated Hospital of Soochow University and Changshu Hospital Affiliated to Soochow University were randomly divided into training set and validation set according to 8 : 2. Based on XGBoost algorithm, the SAP prediction model was constructed on the basis of hyperparameter adjustment by 5-fold cross validation and loss function. The data set of the Second Affiliated Hospital of Soochow University was served as independent test set. The predictive efficacy of the XGBoost model was evaluated by drawing the receiver operator characteristic curve (ROC curve), and compared it with the traditional AP related severity score; variable importance ranking diagram and Shapley additive explanation (SHAP) diagram were drawn to visually explain the model.

Results: A total of 1 183 AP patients were enrolled finally, of which 129 (10.9%) developed SAP. Among the patients from the First Affiliated Hospital of Soochow University and Changshu Hospital Affiliated to Soochow University, there were 786 patients in the training set and 197 in the validation set; 200 patients from the Second Affiliated Hospital of Soochow University were used as the test set. Analysis of all three datasets showed that patients who advanced to SAP exhibited pathological manifestation such as abnormal respiratory function, coagulation function, liver and kidney function, and lipid metabolism. Based on the XGBoost algorithm, an SAP prediction model was constructed, and ROC curve analysis showed that the accuracy for prediction of SAP reached 0.830, the area under the ROC curve (AUC) was 0.927, which was significantly improved compared with the traditional scoring systems including MCTSI, Ranson, BISAP and SABP, the accuracy was 0.610, 0.690, 0.763, 0.625, and the AUC was 0.689, 0.631, 0.875, and 0.770, respectively. The feature importance analysis based on the XGBoost model showed that the top ten items ranked by the importance of model features were admission pleural effusion (0.119), albumin (Alb, 0.049), triglycerides (TG, 0.036), Ca2+ (0.034), prothrombin time (PT, 0.031), systemic inflammatory response syndrome (SIRS, 0.031), C-reactive protein (CRP, 0.031), platelet count (PLT, 0.030), lactate dehydrogenase (LDH, 0.029), and alkaline phosphatase (ALP, 0.028). The above indicators were of great significance for the XGBoost model to predict SAP. The SHAP contribution analysis based on the XGBoost model showed that the risk of SAP increased significantly when patients had pleural effusion and decreased Alb.

Conclusions: A SAP prediction scoring system was established based on the machine automatic learning XGBoost algorithm, which can predict the SAP risk of patients within 48 hours of admission with good accuracy.

Publication types

  • English Abstract

MeSH terms

  • Acute Disease
  • Algorithms
  • Hospitalization
  • Humans
  • Pancreatitis*
  • Retrospective Studies