Improving Cohort-Hospital Matching Accuracy through Standardization and Validation of Participant Identifiable Information

Children (Basel). 2022 Dec 7;9(12):1916. doi: 10.3390/children9121916.

Abstract

Linking very large, consented birth cohorts to birthing hospitals clinical data could elucidate the lifecourse outcomes of health care and exposures during the pregnancy, birth and newborn periods. Unfortunately, cohort personally identifiable information (PII) often does not include unique identifier numbers, presenting matching challenges. To develop optimized cohort matching to birthing hospital clinical records, this pilot drew on a one-year (December 2020-December 2021) cohort for a single Australian birthing hospital participating in the whole-of-state Generation Victoria (GenV) study. For 1819 consented mother-baby pairs and 58 additional babies (whose mothers were not themselves participating), we tested the accuracy and effort of various approaches to matching. We selected demographic variables drawn from names, DOB, sex, telephone, address (and birth order for multiple births). After variable standardization and validation, accuracy rose from 10% to 99% using a deterministic-rule-based approach in 10 steps. Using cohort-specific modifications of the Australian Statistical Linkage Key (SLK-581), it took only 3 steps to reach 97% (SLK-5881) and 98% (SLK-5881.1) accuracy. We conclude that our SLK-5881 process could safely and efficiently achieve high accuracy at the population level for future birth cohort-birth hospital matching in the absence of unique identifier numbers.

Keywords: birth cohort; data accuracy; data linkage; demographics; hospital; hospital records; information retrieval; newborn; personally identifiable information; pregnant women.

Grants and funding

This pilot work was conducted by Generation Victoria (GenV), which is supported by grants from the Paul Ramsay Foundation, the Victorian Government and the Royal Children’s Hospital Foundation. Research at the Murdoch Children’s Research Institute is supported by the Victorian Government’s Operational Infrastructure Support Program. J.W. was supported by a Melbourne Children’s LifeCourse postdoctoral fellowship, funded by Royal Children’s Hospital Foundation grant (reference number 2018–984). S.G. was supported by the Australian National Health and Medical Research Council (NHMRC) Practitioner Fellowship (reference number 155290). M.W. was supported by NHMRC Principal Research Fellowship (reference number 1160906). J.C. was supported by Career Development Fellowship from the MRFF 1141354.