Explainable hierarchical clustering for patient subtyping and risk prediction

Enrico Werner; Jeffrey N Clark; Alexander Hepburn; Ranjeet S Bhamber; Michael Ambler; Christopher P Bourdeaux; Christopher J McWilliams; Raul Santos-Rodriguez

doi:10.1177/15353702231214253

Explainable hierarchical clustering for patient subtyping and risk prediction

Exp Biol Med (Maywood). 2023 Dec;248(24):2547-2559. doi: 10.1177/15353702231214253. Epub 2023 Dec 15.

Authors

Enrico Werner¹, Jeffrey N Clark¹, Alexander Hepburn¹, Ranjeet S Bhamber¹, Michael Ambler², Christopher P Bourdeaux³, Christopher J McWilliams⁴, Raul Santos-Rodriguez⁵

Affiliations

¹ University of Bristol, Bristol BS1 5DD, UK.
² University of Bristol, Bristol BS8 1TD, UK.
³ University Hospitals Bristol NHS Foundation Trust, Bristol BS2 8HW, UK.
⁴ University of Bristol, Bristol BS8 1TW, UK.
⁵ University of Bristol, Bristol BS8 1QU, UK.

Abstract

We present a pipeline in which machine learning techniques are used to automatically identify and evaluate subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. Patient clusters are determined using routinely collected hospital data, such as those used in the UK's National Early Warning Score 2 (NEWS2). An iterative, hierarchical clustering process was used to identify the minimum set of relevant features for cluster separation. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning, illustrating their robustness. In parallel, clinicians assessed intracluster similarities and intercluster differences of the identified patient subtypes within the context of their clinical knowledge. For each cluster, outcome prediction models were trained and their forecasting ability was illustrated against the NEWS2 of the unclustered patient cohort. These preliminary results suggest that subtype models can outperform the established NEWS2 method, providing improved prediction of patient deterioration. By considering both the computational outputs and clinician-based explanations in patient subtyping, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.

Keywords: Hierarchical clustering; clinical evaluation; early warning score; explainability; mortality prediction; patient subtypes.

MeSH terms

Cluster Analysis*
Forecasting
Humans
Inpatients* / classification
Machine Learning*