Building a Harmonized Datamart by Integrating Cross-Institutional Systems of Clinical, Outcome, and Genomic Data: The Pediatric Patient Informatics Platform (PPIP)

JCO Clin Cancer Inform. 2021 Feb:5:202-215. doi: 10.1200/CCI.20.00083.

Abstract

Purpose: Siloed electronic medical data limits utility and accessibility. At the Dana-Farber/Boston Children's Cancer and Blood Disorders Center, cross-institutional data were inconsistent and difficult to access. To unify data for clinical operations, administration, and research, we developed the Pediatric Patient Informatics Platform (PPIP), an integrated datamart harmonizing multiple source systems across two institutions into a common technology.

Patients and methods: Starting in 2009, user requirements were gathered and data sources were prioritized. Project teams, including biostatisticians, database developers, and an external contractor, were formed. Read-access to source systems was established. The 3-layer PPIP architecture was developed: STAGING, a near-exact copy of source data; INTEGRATION, where data were reorganized into domains; and, CONSUMPTION, where data were optimized for rapid retrieval. The diverse systems were integrated into a common IBM Netezza technology. Data filters were defined to accurately capture the Center's patients, and derived data items were created for harmonization across sources. An interactive online query tool, PPIP360, was developed using Microstrategy Analytics.

Results: Driven by scientific objectives, the PPIP datamart was created, including 33,674 patients, 2,983 protocols, and 3.6 million patient visits from 14 source databases, 164 source tables, and 2,622 source data items. The PPIP360 has 605 data items and 33 metrics across 11 reports and dashboards. Dana-Farber and Boston Children's established a legal data-sharing agreement. The PPIP has supported hundreds of faculty, staff, and projects, including planning clinical trials and informing strategic planning.

Conclusion: The PPIP has successfully harmonized and integrated diagnostic, demographic, laboratory, treatment, clinical outcome, pathology, transplant, meta-protocol, and -omics data, for efficient, daily operational and research activities at Dana-Farber/Boston Children's Cancer and Blood Disorders Center, and future external sharing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Databases, Factual
  • Genomics
  • Humans
  • Information Dissemination*
  • Information Storage and Retrieval*