Benchmarking machine learning models on multi-centre eICU critical care dataset

PLoS One. 2020 Jul 2;15(7):e0235424. doi: 10.1371/journal.pone.0235424. eCollection 2020.

Abstract

Progress of machine learning in critical care has been difficult to track, in part due to absence of public benchmarks. Other fields of research (such as computer vision and natural language processing) have established various competitions and public benchmarks. Recent availability of large clinical datasets has enabled the possibility of establishing public benchmarks. Taking advantage of this opportunity, we propose a public benchmark suite to address four areas of critical care, namely mortality prediction, estimation of length of stay, patient phenotyping and risk of decompensation. We define each task and compare the performance of both clinical models as well as baseline and deep learning models using eICU critical care dataset of around 73,000 patients. This is the first public benchmark on a multi-centre critical care dataset, comparing the performance of clinical gold standard with our predictive model. We also investigate the impact of numerical variables as well as handling of categorical variables on each of the defined tasks. The source code, detailing our methods and experiments is publicly available such that anyone can replicate our results and build upon our work.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Benchmarking*
  • Clinical Decision Rules
  • Critical Care / standards*
  • Datasets as Topic
  • Hospital Mortality
  • Humans
  • Length of Stay
  • Machine Learning*
  • Software

Grants and funding

VO was supported by European Commission’s Horizon 2020 Project, WellCo, under grant agreement No 769765 (https://cordis.europa.eu/project/id/769765). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.