Artificial intelligence versus surgeon gestalt in predicting risk of emergency general surgery

Mohamad El Moheb; Anthony Gebran; Lydia R Maurer; Leon Naar; Majed El Hechi; Kerry Breen; Ander Dorken-Gallastegi; Robert Sinyard; Dimitris Bertsimas; George Velmahos; Haytham M A Kaafarani

doi:10.1097/TA.0000000000004030

Artificial intelligence versus surgeon gestalt in predicting risk of emergency general surgery

J Trauma Acute Care Surg. 2023 Oct 1;95(4):565-572. doi: 10.1097/TA.0000000000004030. Epub 2023 Jun 14.

Authors

Mohamad El Moheb¹, Anthony Gebran, Lydia R Maurer, Leon Naar, Majed El Hechi, Kerry Breen, Ander Dorken-Gallastegi, Robert Sinyard, Dimitris Bertsimas, George Velmahos, Haytham M A Kaafarani

Affiliation

¹ From the Division of Trauma, Emergency Surgery, and Surgical Critical Care (M.E.M., A.G., L.R.M., L.N., M.E.H., K.B., A.D.-G., R.S., G.V., H.M.A.K.), Massachusetts General Hospital, Boston; and Massachusetts Institute of Technology (D.B.), Cambridge, Massachusetts.

PMID: 37314698
DOI: 10.1097/TA.0000000000004030

Abstract

Background: Artificial intelligence (AI) risk prediction algorithms such as the smartphone-available Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) for emergency general surgery (EGS) are superior to traditional risk calculators because they account for complex nonlinear interactions between variables, but how they compare to surgeons' gestalt remains unknown. Herein, we sought to: (1) compare POTTER to surgeons' surgical risk estimation and (2) assess how POTTER influences surgeons' risk estimation.

Study design: A total of 150 patients who underwent EGS at a large quaternary care center between May 2018 and May 2019 were prospectively followed up for 30-day postoperative outcomes (mortality, septic shock, ventilator dependence, bleeding requiring transfusion, pneumonia), and clinical cases were systematically created representing their initial presentation. POTTER's outcome predictions for each case were also recorded. Thirty acute care surgeons with diverse practice settings and levels of experience were then randomized into two groups: 15 surgeons (SURG) were asked to predict the outcomes without access to POTTER's predictions while the remaining 15 (SURG-POTTER) were asked to predict the same outcomes after interacting with POTTER. Comparing to actual patient outcomes, the area under the curve (AUC) methodology was used to assess the predictive performance of (1) POTTER versus SURG, and (2) SURG versus SURG-POTTER.

Results: POTTER outperformed SURG in predicting all outcomes (mortality-AUC: 0.880 vs. 0.841; ventilator dependence-AUC: 0.928 vs. 0.833; bleeding-AUC: 0.832 vs. 0.735; pneumonia-AUC: 0.837 vs. 0.753) except septic shock (AUC: 0.816 vs. 0.820). SURG-POTTER outperformed SURG in predicting mortality (AUC: 0.870 vs. 0.841), bleeding (AUC: 0.811 vs. 0.735), pneumonia (AUC: 0.803 vs. 0.753) but not septic shock (AUC: 0.712 vs. 0.820) or ventilator dependence (AUC: 0.834 vs. 0.833).

Conclusion: The AI risk calculator POTTER outperformed surgeons' gestalt in predicting the postoperative mortality and outcomes of EGS patients, and when used, improved the individual surgeons' risk prediction. Artificial intelligence algorithms, such as POTTER, could prove useful as a bedside adjunct to surgeons when preoperatively counseling patients.

Level of evidence: Prognostic and Epidemiological; Level II.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Humans
Postoperative Complications / epidemiology
Postoperative Complications / etiology
Prognosis
Risk Assessment / methods
Surgeons*