Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications

Júlio Souza; Ismael Caballero; João Vasco Santos; Mariana Lobo; Andreia Pinto; João Viana; Carlos Sáez; Fernando Lopes; Alberto Freitas

doi:10.1016/j.jbi.2022.104242

Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications

J Biomed Inform. 2022 Dec:136:104242. doi: 10.1016/j.jbi.2022.104242. Epub 2022 Nov 11.

Authors

Júlio Souza¹, Ismael Caballero², João Vasco Santos³, Mariana Lobo⁴, Andreia Pinto⁴, João Viana⁴, Carlos Sáez⁵, Fernando Lopes⁴, Alberto Freitas⁴

Affiliations

¹ Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, University of Porto, Porto, Portugal; Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, University of Porto, Porto, Portugal. Electronic address: juliobsouza@med.up.pt.
² University of Castilla-La Mancha, Ciudad Real, Castilla-La Mancha, Spain.
³ Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, University of Porto, Porto, Portugal; Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, University of Porto, Porto, Portugal; Public Health Unit, ACES Grande Porto V - Porto Ocidental, ARS Norte, Portugal.
⁴ Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, University of Porto, Porto, Portugal; Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, University of Porto, Porto, Portugal.
⁵ Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones (ITACA), Universitat Politècnica de València (UPV), Spain.

PMID: 36372346
DOI: 10.1016/j.jbi.2022.104242

Abstract

Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database.

Methods: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal.

Main findings: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the Diagnosis-Related Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors.

Conclusions: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.

Keywords: Clinical classification software; Clinical coding; Data quality; Data variability; International classification of diseases.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Accuracy*
Hospitalization
Hospitals
Humans
Myocardial Infarction*
Portugal