Evaluating machine learning models for sepsis prediction: A systematic review of methodologies

Hong-Fei Deng; Ming-Wei Sun; Yu Wang; Jun Zeng; Ting Yuan; Ting Li; Di-Huan Li; Wei Chen; Ping Zhou; Qi Wang; Hua Jiang

doi:10.1016/j.isci.2021.103651

Evaluating machine learning models for sepsis prediction: A systematic review of methodologies

iScience. 2021 Dec 20;25(1):103651. doi: 10.1016/j.isci.2021.103651. eCollection 2022 Jan 21.

Authors

Hong-Fei Deng^{1

2}, Ming-Wei Sun³, Yu Wang^{1

2

3

4}, Jun Zeng^{1

2

3

4}, Ting Yuan^{1

2}, Ting Li^{1

2}, Di-Huan Li^{1

2}, Wei Chen⁵, Ping Zhou⁶, Qi Wang⁷, Hua Jiang^{1

2

3

4}

Affiliations

¹ Institute for Emergency and Disaster Medicine, Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan 610072, China.
² School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054, China.
³ Emergency Center of Sichuan Provincial People's Hospital, Sichuan Academy of Medical Sciences, Chengdu 610072, China.
⁴ Sichuan Clinical Research Center for Emergency and Critical Care, Chengdu, Sichuan 610072, China.
⁵ Department of Clinical Nutrition, Peking Union Medical College Hospital, Beijing 100730, China.
⁶ Emergency Intensive Care Unit of Sichuan Provincial People's Hospital, Sichuan Academy of Medical Sciences, Chengdu 610072, China.
⁷ Beijing Computational Science Research Center, Beijing 100193, China.

Abstract

Studies for sepsis prediction using machine learning are developing rapidly in medical science recently. In this review, we propose a set of new evaluation criteria and reporting standards to assess 21 qualified machine learning models for quality analysis based on PRISMA. Our assessment shows that (1.) the definition of sepsis is not consistent among the studies; (2.) data sources and data preprocessing methods, machine learning models, feature engineering, and inclusion types vary widely among the studies; (3.) the closer to the onset of sepsis, the higher the value of AUROC is; (4.) the improvement in AUROC is primarily due to using machine learning as a feature engineering tool; (5.) deep neural networks coupled with Sepsis-3 diagnostic criteria tend to yield better results on the time series data collected from patients with sepsis. The new evaluation criteria and reporting standards will facilitate the development of improved machine learning models for clinical applications.

Keywords: Clinical medicine; Machine learning.