The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

Simon Meyer Lauritsen; Bo Thiesson; Marianne Johansson Jørgensen; Anders Hammerich Riis; Ulrick Skipper Espelund; Jesper Bo Weile; Jeppe Lange

doi:10.1038/s41746-021-00529-x

The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

NPJ Digit Med. 2021 Nov 15;4(1):158. doi: 10.1038/s41746-021-00529-x.

Authors

Simon Meyer Lauritsen^{1

2}, Bo Thiesson^{3

4}, Marianne Johansson Jørgensen⁵, Anders Hammerich Riis³, Ulrick Skipper Espelund^{5

6}, Jesper Bo Weile^{7

8}, Jeppe Lange^{9

5}

Affiliations

¹ Enversion A/S, Fiskerivej 12, 1st floor, 8000, Aarhus C, Denmark. sla@enversion.dk.
² Department of Clinical Medicine, Aarhus University, Aarhus N, Denmark. sla@enversion.dk.
³ Enversion A/S, Fiskerivej 12, 1st floor, 8000, Aarhus C, Denmark.
⁴ Department of Engineering, Aarhus University, Aarhus C, Denmark.
⁵ Department of Research, Horsens Regional Hospital, Horsens, Denmark.
⁶ Department of Anesthesiology, Horsens Regional Hospital, Horsens, Denmark.
⁷ Emergency Department, Horsens Regional Hospital, Horsens, Denmark.
⁸ Research Center for Emergency Medicine, Aarhus University Hospital, Aarhus, Denmark.
⁹ Department of Clinical Medicine, Aarhus University, Aarhus N, Denmark.

Abstract

Problem framing is critical to developing risk prediction models because all subsequent development work and evaluation takes place within the context of how a problem has been framed and explicit documentation of framing choices makes it easier to compare evaluation metrics between published studies. In this work, we introduce the basic concepts of framing, including prediction windows, observation windows, window shifts and event-triggers for a prediction that strongly affects the risk of clinician fatigue caused by false positives. Building on this, we apply four different framing structures to the same generic dataset, using a sepsis risk prediction model as an example, and evaluate how framing affects model performance and learning. Our results show that an apparently good model with strong evaluation results in both discrimination and calibration is not necessarily clinically usable. Therefore, it is important to assess the results of objective evaluations within the context of more subjective evaluations of how a model is framed.

Grants and funding

8053-00076B/Innovationsfonden (Innovation Fund Denmark)