Facilitating Cancer Epidemiologic Efforts in Cleveland via Creation of Longitudinal De-Duplicated Patient Data Sets

Cancer Epidemiol Biomarkers Prev. 2020 Apr;29(4):787-795. doi: 10.1158/1055-9965.EPI-19-0815. Epub 2020 Jan 27.

Abstract

Background: Cleveland, Ohio, is home to three major hospital systems serving approximately 80% of the Northeast Ohio population. The Cleveland Clinic, University Hospitals Health System, and MetroHealth are direct competitors for primary and specialty care, and patient overlap between these systems is high. Fragmentation of health data that exist in silos at these health systems produces an overestimation of disease burden due to double and sometimes triple counting of patients. As a result, longitudinal population-based studies across the Cleveland patient population are impeded unless accurate and actionable clinically derived health data sets can be created.

Methods: The Cleveland Institute for Computational Biology has developed the De-Duplicate and De-Identify Research Engine (DeDeRE) that, without any exchange of personal health identifiers (PHI) between health systems, will effectively de-duplicate the patients between one or more health entities.

Results: The immediate utility of this software for cancer epidemiology is the increased accuracy in measuring cancer burden and the potential to perform longitudinal studies with de-duplicated, de-identified data sets.

Conclusions: The DeDeRE software developed and tested here accomplishes its goals without exposing PHIs using a state-of-the-art, trusted privacy preservation network enabled by a hash-based matching algorithm.

Impact: This paper will guide the reader through the functions currently developed in DeDeRE and how a healthcare organization (HCO) employing the release version of this technology can begin sharing data with one or more additional HCOs in a collaborative and noncompetitive manner to create a regional population health resource for cancer researchers.See all articles in this CEBP Focus section, "Modernizing Population Science."

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cities / epidemiology
  • Confidentiality
  • Datasets as Topic*
  • Health Information Exchange*
  • Health Records, Personal*
  • Humans
  • Neoplasms / epidemiology*
  • Ohio
  • Software