Correcting prevalence estimation for biased sampling with testing errors

Stat Med. 2023 Nov 20;42(26):4713-4737. doi: 10.1002/sim.9885. Epub 2023 Sep 1.

Abstract

Sampling for prevalence estimation of infection is subject to bias by both oversampling of symptomatic individuals and error-prone tests. This results in naïve estimators of prevalence (ie, proportion of observed infected individuals in the sample) that can be very far from the true proportion of infected. In this work, we present a method of prevalence estimation that reduces both the effect of bias due to testing errors and oversampling of symptomatic individuals, eliminating it altogether in some scenarios. Moreover, this procedure considers stratified errors in which tests have different error rate profiles for symptomatic and asymptomatic individuals. This results in easily implementable algorithms, for which code is provided, that produce better prevalence estimates than other methods (in terms of reducing and/or removing bias), as demonstrated by formal results, simulations, and on COVID-19 data from the Israeli Ministry of Health.

Keywords: COVID-19; active information; bias correction; maximum entropy; prevalence; sampling; sampling bias; testing errors.