Explainable hierarchical clustering for patient subtyping and risk prediction

Exp Biol Med (Maywood). 2023 Dec;248(24):2547-2559. doi: 10.1177/15353702231214253. Epub 2023 Dec 15.

Abstract

We present a pipeline in which machine learning techniques are used to automatically identify and evaluate subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. Patient clusters are determined using routinely collected hospital data, such as those used in the UK's National Early Warning Score 2 (NEWS2). An iterative, hierarchical clustering process was used to identify the minimum set of relevant features for cluster separation. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning, illustrating their robustness. In parallel, clinicians assessed intracluster similarities and intercluster differences of the identified patient subtypes within the context of their clinical knowledge. For each cluster, outcome prediction models were trained and their forecasting ability was illustrated against the NEWS2 of the unclustered patient cohort. These preliminary results suggest that subtype models can outperform the established NEWS2 method, providing improved prediction of patient deterioration. By considering both the computational outputs and clinician-based explanations in patient subtyping, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.

Keywords: Hierarchical clustering; clinical evaluation; early warning score; explainability; mortality prediction; patient subtypes.

MeSH terms

  • Cluster Analysis*
  • Forecasting
  • Humans
  • Inpatients* / classification
  • Machine Learning*