Why Are Data Missing in Clinical Data Warehouses? A Simulation Study of How Data Are Processed (and Can Be Lost)

Stud Health Technol Inform. 2023 May 18:302:202-206. doi: 10.3233/SHTI230103.

Abstract

In recent years, the development of clinical data warehouses (CDW) has put Electronic Health Records (EHR) data in the spotlight. More and more innovative technologies for healthcare are based on these EHR data. However, quality assessments on EHR data are fundamental to gain confidence in the performances of new technologies. The infrastructure developed to access EHR data - CDW - can affect EHR data quality but its impact is difficult to measure. We conducted a simulation on the Assistance Publique - Hôpitaux de Paris (AP-HP) infrastructure to assess how a study on breast cancer care pathways could be affected by the complexity of the data flows between the AP-HP Hospital Information System, the CDW, and the analysis platform. A model of the data flows was developed. We retraced the flows of specific data elements for a simulated cohort of 1,000 patients. We estimated that 756 [743;770] and 423 [367;483] patients had all the data elements necessary to reconstruct the care pathway in the analysis platform in the "best case" scenarios (losses affect the same patients) and in a random distribution scenario (losses affect patients at random), respectively.

Keywords: Clinical data warehouse; EHR data; data quality; simulation.

MeSH terms

  • Computer Simulation
  • Data Warehousing*
  • Delivery of Health Care
  • Electronic Health Records
  • Hospital Information Systems*
  • Humans