Measuring follow-up time in routinely-collected health datasets: Challenges and solutions

PLoS One. 2020 Feb 11;15(2):e0228545. doi: 10.1371/journal.pone.0228545. eCollection 2020.

Abstract

A key requirement for longitudinal studies using routinely-collected health data is to be able to measure what individuals are present in the datasets used, and over what time period. Individuals can enter and leave the covered population of administrative datasets for a variety of reasons, including both life events and characteristics of the datasets themselves. An automated, customizable method of determining individuals' presence was developed for the primary care dataset in Swansea University's SAIL Databank. The primary care dataset covers only a portion of Wales, with 76% of practices participating. The start and end date of the data varies by practice. Additionally, individuals can change practices or leave Wales. To address these issues, a two step process was developed. First, the period for which each practice had data available was calculated by measuring changes in the rate of events recorded over time. Second, the registration records for each individual were simplified. Anomalies such as short gaps and overlaps were resolved by applying a set of rules. The result of these two analyses was a cleaned set of records indicating start and end dates of available primary care data for each individual. Analysis of GP records showed that 91.0% of events occurred within periods calculated as having available data by the algorithm. 98.4% of those events were observed at the same practice of registration as that computed by the algorithm. A standardized method for solving this common problem has enabled faster development of studies using this data set. Using a rigorous, tested, standardized method of verifying presence in the study population will also positively influence the quality of research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Continuity of Patient Care / standards
  • Continuity of Patient Care / statistics & numerical data
  • Data Collection / methods*
  • Data Collection / standards
  • Databases, Factual
  • Datasets as Topic* / standards
  • Datasets as Topic* / statistics & numerical data
  • Diagnostic Tests, Routine / standards
  • Diagnostic Tests, Routine / statistics & numerical data
  • Electronic Health Records / organization & administration
  • Electronic Health Records / standards
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Follow-Up Studies*
  • Humans
  • Incidence
  • Longitudinal Studies
  • Male
  • Medical Record Linkage* / standards
  • Practice Patterns, Physicians' / statistics & numerical data
  • Primary Health Care / organization & administration
  • Primary Health Care / standards
  • Primary Health Care / statistics & numerical data
  • Research Design
  • Stroke / drug therapy
  • Stroke / epidemiology
  • Stroke / prevention & control
  • Time Factors
  • Wales / epidemiology
  • Warfarin / therapeutic use

Substances

  • Warfarin