Evaluating a Targeted Minimum Loss-Based Estimator for Capture-Recapture Analysis: An Application to HIV Surveillance in San Francisco, California

Am J Epidemiol. 2024 Apr 8;193(4):673-683. doi: 10.1093/aje/kwad231.

Abstract

The capture-recapture method is a common tool used in epidemiology to estimate the size of "hidden" populations and correct the underascertainment of cases, based on incomplete and overlapping lists of the target population. Log-linear models are often used to estimate the population size yet may produce implausible and unreliable estimates due to model misspecification and small cell sizes. A novel targeted minimum loss-based estimation (TMLE) model developed for capture-recapture makes several notable improvements to conventional modeling: "targeting" the parameter of interest, flexibly fitting the data to alternative functional forms, and limiting bias from small cell sizes. Using simulations and empirical data from the San Francisco, California, Department of Public Health's human immunodeficiency virus (HIV) surveillance registry, we evaluated the performance of the TMLE model and compared results with those of other common models. Based on 2,584 people observed on 3 lists reportable to the surveillance registry, the TMLE model estimated the number of San Francisco residents living with HIV as of December 31, 2019, to be 13,523 (95% confidence interval: 12,222, 14,824). This estimate, compared with a "ground truth" of 12,507, was the most accurate and precise of all models examined. The TMLE model is a significant advancement in capture-recapture studies, leveraging modern statistical methods to improve estimation of the sizes of hidden populations.

Keywords: SuperLearner; bias; capture-recapture method; hidden populations; human immunodeficiency virus; machine learning; prevalence estimation; targeted minimum loss-based estimation.

MeSH terms

  • Bias
  • HIV Infections* / epidemiology
  • HIV*
  • Humans
  • Linear Models
  • San Francisco / epidemiology