Correcting prevalence estimation for biased sampling with testing errors

Lili Zhou; Daniel Andrés Díaz-Pachón; Chen Zhao; J Sunil Rao; Ola Hössjer

doi:10.1002/sim.9885

Correcting prevalence estimation for biased sampling with testing errors

Stat Med. 2023 Nov 20;42(26):4713-4737. doi: 10.1002/sim.9885. Epub 2023 Sep 1.

Authors

Lili Zhou¹, Daniel Andrés Díaz-Pachón¹, Chen Zhao¹, J Sunil Rao², Ola Hössjer³

Affiliations

¹ Division of Biostatistics, University of Miami, Miami, Florida, USA.
² Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA.
³ Department of Mathematics, Stockholm University, Stockholm, Sweden.

PMID: 37655557
DOI: 10.1002/sim.9885

Abstract

Sampling for prevalence estimation of infection is subject to bias by both oversampling of symptomatic individuals and error-prone tests. This results in naïve estimators of prevalence (ie, proportion of observed infected individuals in the sample) that can be very far from the true proportion of infected. In this work, we present a method of prevalence estimation that reduces both the effect of bias due to testing errors and oversampling of symptomatic individuals, eliminating it altogether in some scenarios. Moreover, this procedure considers stratified errors in which tests have different error rate profiles for symptomatic and asymptomatic individuals. This results in easily implementable algorithms, for which code is provided, that produce better prevalence estimates than other methods (in terms of reducing and/or removing bias), as demonstrated by formal results, simulations, and on COVID-19 data from the Israeli Ministry of Health.

Keywords: COVID-19; active information; bias correction; maximum entropy; prevalence; sampling; sampling bias; testing errors.

Grants and funding

2022/Copeland Foundation Award, Department of Public Health Sciences, University of Miami, 2022