A statistical algorithm for outbreak detection in multisite settings: an application to sick leave monitoring

Bioinform Adv. 2023 Jun 14;3(1):vbad079. doi: 10.1093/bioadv/vbad079. eCollection 2023.

Abstract

Motivation: Public health authorities monitor cases of health-related problems over time using surveillance algorithms that detect unusually high increases in the number of cases, namely aberrations. Statistical aberrations signal outbreaks when further investigation reveals epidemiological significance. The increasing availability and diversity of epidemiological data and the most recent epidemic threats call for more accurate surveillance algorithms that not just detect aberration times but also detect locations. Sick leave data, for instance, can be monitored across companies to identify companies-related aberrations. In this context, we develop an extension to multisite surveillance of a routinely used aberration detection algorithm, the quasi-Poisson regression Farrington Flexible algorithm. The new algorithm consists of a negative-binomial mixed effects regression model with a random effects term for sites and a new reweighting procedure reducing the effect of past aberrations.

Results: A wide range of simulations shows that, compared with Farrington Flexible, the new algorithm produces better false positive rates and similar probabilities of detecting genuine outbreaks, for case counts that exceed historical baselines by 3 SD. As expected, higher surges lead to lower false positive rates and higher probabilities of detecting true outbreaks. The new algorithm provides better detection of true outbreaks, reaching 100%, when cases exceed eight baseline standard deviations. We apply our algorithm to sick leave rates in the context of COVID-19 and find that it detects the pandemic effect. The new algorithm is easily implementable over a range of contrasting data scenarios, providing good overall performance and new perspectives for multisite surveillance.

Availability and implementation: All the analyses are performed in the R statistical software using the package glmmTMB. The code for performing the analyses and for generating the simulations can be found online at the following link: https://github.com/TomDuchemin/mixed_surveillance.

Contact: a.noufaily@warwick.ac.uk.