Estimating the Number of Persons with HIV in Jails via Web Scraping and Record Linkage

J R Stat Soc Ser A Stat Soc. 2022 Dec;185(Suppl 2):S270-S287. doi: 10.1111/rssa.12909. Epub 2022 Aug 10.

Abstract

This paper presents methods to estimate the number of persons with HIV in North Carolina jails by applying finite population inferential approaches to data collected using web scraping and record linkage techniques. Administrative data are linked with web-scraped rosters of incarcerated persons in a nonrandom subset of counties. Outcome regression and calibration weighting are adapted for state-level estimation. Methods are compared in simulations and are applied to data from the US state of North Carolina. Outcome regression yielded more precise inference and allowed for county-level estimates, an important study objective, while calibration weighting exhibited double robustness under misspecification of the outcome or weight model.

Keywords: Outcome Regression; Web Scraping; Weight Calibration.