Temporal phenotyping by mining healthcare data to derive lines of therapy for cancer

J Biomed Inform. 2019 Dec:100:103335. doi: 10.1016/j.jbi.2019.103335. Epub 2019 Nov 2.

Abstract

Lines of therapy (LOT) derived from real-world healthcare data not only depict real-world cancer treatment sequences, but also help define patient phenotypes along the course of disease progression and therapeutic interventions. The sequence of prescribed anticancer therapies can be defined as temporal phenotyping resulting from changes in morphological (tumor staging), biochemical (biomarker testing), physiological (disease progression), and behavioral (physician prescribing and patient adherence) parameters. We introduce a novel methodology that is a two-part approach: 1) create an algorithm to derive patient-level LOT and 2) aggregate LOT information via clustering to derive temporal phenotypes, in conjunction with visualization techniques, within a large insurance claims dataset. We demonstrated the methodology using two examples: metastatic non-small cell lung cancer and metastatic melanoma. First, we generated a longitudinal patient cohort for each cancer type and applied a set of rules to derive patient-level LOT. Then the LOT algorithm outputs for each cancer type were visualized using Sankey plots and K-means clusters based on durations of LOT and of gaps in therapy between LOT. We found differential distribution of temporal phenotypes across clusters. Our approach to identify temporal patient phenotypes can increase the quality and utility of analyses conducted using claims datasets, with the potential for application to multiple oncology disease areas across diverse healthcare data sources. The understanding of LOT as defining patients' temporal phenotypes can contribute to continuous health learning of disease progression and its interaction with different treatment pathways; in addition, this understanding can provide new insights that can be applied by tailoring treatment sequences for the patient phenotypes who will benefit.

Keywords: Claims database; K-means clustering analysis; Oncology line of therapy; Patient-level; Temporal phenotyping; Treatment sequence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Carcinoma, Non-Small-Cell Lung / pathology
  • Carcinoma, Non-Small-Cell Lung / therapy*
  • Data Mining*
  • Humans
  • Lung Neoplasms / pathology
  • Lung Neoplasms / therapy*
  • Melanoma / pathology
  • Melanoma / therapy*
  • Phenotype*
  • Skin Neoplasms / pathology
  • Skin Neoplasms / therapy*