Multi-objective semi-supervised clustering to identify health service patterns for injured patients

Health Inf Sci Syst. 2019 Aug 29;7(1):18. doi: 10.1007/s13755-019-0080-6. eCollection 2019 Dec.

Abstract

Purpose: This study develops a pattern recognition method that identifies patterns based on their similarity and their association with the outcome of interest. The practical purpose of developing this pattern recognition method is to group patients, who are injured in transport accidents, in the early stages post-injury. This grouping is based on distinctive patterns in health service use within the first week post-injury. The groups also provide predictive information towards the total cost of medication process. As a result, the group of patients who have undesirable outcomes are identified as early as possible based health service use patterns.

Methods: We propose a multi-objective optimization model to group patients. An objective function is the cost function of k-medians clustering to recognize the similar patterns. Another objective function is the cross-validated root-mean-square error to examine the association with the total cost. The best grouping is obtained by minimizing both objective functions. As a result, the multi-objective optimization model is a semi-supervised clustering which learns health service use patterns in both unsupervised and supervised ways. We also introduce an evolutionary computation approach includes stochastic gradient descent and Pareto optimal solutions to find the optimal solution. In addition, we use the decision tree method to reproduce the optimal groups using an interpretable classification model.

Results: The results show that the proposed multi-objective semi-supervised clustering identifies distinct groups of health service uses and contributes to predict the total cost. The performance of the multi-objective model has been examined using two metrics such as the average silhouette width and the cross-validation error. The examination proves that the multi-objective model outperforms the single-objective ones. In addition, the interpretable classification model shows that imaging and therapeutic services are critical services in the first-week post-injury to group injured patients.

Conclusion: The proposed multi-objective semi-supervised clustering finds the optimal clusters that not only are well-separated from each other but can provide informative insights regarding the outcome of interest. It also overcomes two drawback of clustering methods such as being sensitive to the initial cluster centers and need for specifying the number of clusters.

Keywords: Evolutionary computation; Health service patterns; Injured patients; Multi-objective optimization; Semi-supervised clustering.