Improved Interpretability of Machine Learning Model Using Unsupervised Clustering: Predicting Time to First Treatment in Chronic Lymphocytic Leukemia

JCO Clin Cancer Inform. 2019 May:3:1-11. doi: 10.1200/CCI.18.00137.

Abstract

Purpose: Time to event is an important aspect of clinical decision making. This is particularly true when diseases have highly heterogeneous presentations and prognoses, as in chronic lymphocytic lymphoma (CLL). Although machine learning methods can readily learn complex nonlinear relationships, many methods are criticized as inadequate because of limited interpretability. We propose using unsupervised clustering of the continuous output of machine learning models to provide discrete risk stratification for predicting time to first treatment in a cohort of patients with CLL.

Patients and methods: A total of 737 treatment-naïve patients with CLL diagnosed at Mayo Clinic were included in this study. We compared predictive abilities for two survival models (Cox proportional hazards and random survival forest) and four classification methods (logistic regression, support vector machines, random forest, and gradient boosting machine). Probability of treatment was then stratified.

Results: Machine learning methods did not yield significantly more accurate predictions of time to first treatment. However, automated risk stratification provided by clustering was able to better differentiate patients who were at risk for treatment within 1 year than models developed using standard survival analysis techniques.

Conclusion: Clustering the posterior probabilities of machine learning models provides a way to better interpret machine learning models.

MeSH terms

  • Adult
  • Aged
  • Cluster Analysis*
  • Female
  • Follow-Up Studies
  • Humans
  • Kaplan-Meier Estimate
  • Leukemia, Lymphocytic, Chronic, B-Cell / mortality*
  • Leukemia, Lymphocytic, Chronic, B-Cell / therapy
  • Machine Learning*
  • Male
  • Middle Aged
  • Models, Theoretical*
  • Prognosis
  • Reproducibility of Results
  • Risk Factors
  • Time-to-Treatment
  • Treatment Outcome