longSil: an Evaluation Metric to Assess Quality of Clustering Longitudinal Clinical Data

J Healthc Inform Res. 2019 Nov 19;3(4):441-459. doi: 10.1007/s41666-019-00058-z. eCollection 2019 Dec.

Abstract

Longitudinal disease subtyping is an important problem within the broader scope of computational phenotyping. In this article, we discuss several data-driven unsupervised disease subtyping methods to obtain disease subtypes from longitudinal clinical data. The methods are analyzed in the context of chronic kidney disease, one of the leading health problems, both in the USA and worldwide. To provide a quantitative comparison of the different methods, we propose a novel evaluation metric that measures the cluster tightness and degree of separation between the various clusters produced by each method. Comparative results for two significantly large clinical datasets are provided, along with key insights that are possible due to the proposed evaluation metric.

Keywords: Clustering; Computational phenotyping; Disease subtype; Evaluation metric; Silhouette coefficient.