Methods for dealing with discrepant records in linked population health datasets: a cross-sectional study

BMC Health Serv Res. 2007 Jan 30:7:12. doi: 10.1186/1472-6963-7-12.

Abstract

Background: Linked population health data are increasingly used in epidemiological studies. If data items are reported on more than one dataset, data linkage can reduce the under-ascertainment associated with many population health datasets. However, this raises the possibility of discrepant case reports from different datasets.

Methods: We examined the effect of four methods of classifying discrepant reports from different population health datasets on the estimated prevalence of hypertensive disorders of pregnancy and on the adjusted odds ratios (aOR) for known risk factors. Data were obtained from linked, validated, birth and hospital data for women who gave birth in a New South Wales hospital (Australia) 2000-2002.

Results: Among 250,173 women with linked data, 238,412 (95.3%) women had perfect agreement on the occurrence of hypertension, 1577 (0.6%) had imperfect agreement; 9369 (3.7%) had hypertension reported in only one dataset (under-reporting) and 815 (0.3%) had conflicting types of hypertension. Using only perfect agreement between birth and discharge data resulted in the lowest prevalence rates (0.3% chronic, 5.1% pregnancy hypertension), while including all reports resulted in the highest prevalence rates (1.1 % chronic, 8.7% pregnancy hypertension). The higher prevalence rates were generally consistent with international reports. In contrast, perfect agreement gave the highest aOR (95% confidence interval) for known risk factors: risk of chronic hypertension for maternal age > or =40 years was 4.0 (2.9, 5.3) and the risk of pregnancy hypertension for multiple birth was 2.8 (2.5, 3.2).

Conclusion: The method chosen for classifying discrepant case reports should vary depending on the study question; all reports should be used as part of calculating the range of prevalence estimates, but perfect matches may be best suited to risk factor analyses. These findings are likely to be applicable to the linkage of any specialised health services datasets to population data that include information on diagnoses or procedures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cross-Sectional Studies
  • Databases, Factual*
  • Demography
  • Female
  • Humans
  • Hypertension, Pregnancy-Induced / classification
  • Hypertension, Pregnancy-Induced / epidemiology*
  • Inpatients / statistics & numerical data
  • Medical Record Linkage*
  • Midwifery / statistics & numerical data
  • New South Wales / epidemiology
  • Odds Ratio
  • Pregnancy
  • Prevalence
  • Risk Factors