Validation of a Cyclic Algorithm to Proxy Number of Lines of Systemic Cancer Therapy Using Administrative Data

JCO Clin Cancer Inform. 2019 Aug:3:1-10. doi: 10.1200/CCI.19.00022.

Abstract

Purpose: Researchers are automating the process for identifying the number of lines of systemic cancer therapy received by patients. To date, algorithm development has involved manual modifications to predefined classification rules. In this study, we propose a supervised learning algorithm for determining the best-performing proxy for number of lines of therapy and validate this approach in four patient groups.

Materials and methods: We retrospectively analyzed BC Cancer pharmacy records from patients' cancer diagnosis until end of follow-up (cohort-specific, 2014/2015). We created and validated a cyclic algorithm in patients with advanced cancers of varying histologies, diffuse large B-cell lymphoma, follicular lymphoma, and chronic lymphocytic leukemia. To assess internal and external validity, we used a split-sample approach for all analyses and considered lines of therapy identified through manual review as our criterion standard. We measured agreement using correlation coefficients, mean squared error, nonparametric hypothesis testing, and quantile-quantile plots.

Results: Cohorts comprised 91 patients with advanced cancers, 121 with chronic lymphocytic leukemia, 440 with follicular lymphoma, and 679 with diffuse large B-cell lymphoma. Number of lines of therapy received and patients' treatment period length varied substantially across cohorts. Despite these differences, our algorithm successfully identified a best-performing proxy for number of lines of therapy for each cohort, which was moderate to highly correlated with (within-sample: 0.73 ≤ Pearson correlation ≤ 0.84; out-of-sample: 0.52 ≤ Pearson correlation ≤ 0.76) and whose distribution did not significantly differ from the criterion standard within or out of sample (P > .10).

Conclusion: Supervised learning is an ideal tool for generating a best-performing proxy that recognizes prescription drug patterns and approximates number of lines of therapy. Our cyclic approach can be used in jurisdictions with access to administrative pharmacy data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Antineoplastic Combined Chemotherapy Protocols / therapeutic use
  • Humans
  • Neoplasms / pathology
  • Neoplasms / therapy*
  • Registries
  • Research Design
  • Supervised Machine Learning*