Variable screening methods in spatial infectious disease transmission models

Spat Spatiotemporal Epidemiol. 2023 Nov:47:100622. doi: 10.1016/j.sste.2023.100622. Epub 2023 Oct 21.

Abstract

Data-driven mathematical modelling can enrich our understanding of infectious disease spread enormously. Individual-level models of infectious disease transmission allow the incorporation of different individual-level covariates, such as spatial location, vaccination status, etc. This study aims to explore and develop methods for fitting such models when we have many potential covariates to include in the model. The aim is to enhance the performance and interpretability of models and ease the computational burden of fitting these models to data. We have applied and compared multiple variable selection methods in the context of spatial epidemic data. These include a Bayesian two-stage least absolute shrinkage and selection operator (Lasso), forward and backward stepwise selection based on the Akaike information criterion (AIC), spike-and-slab priors, and random variable selection (boosting) methods. We discuss and compare the performance of these methods via simulated datasets and UK 2001 foot-and-mouth disease data. While comparing the variable selection methods all performed consistently well except the two-stage Lasso. We conclude that the spike-and-slab prior method is to be recommended, consistently resulting in high accuracy and short computational time.

Keywords: AIC; Boosting; Individual-level models; Spike-and-slab prior; Two-stage Lasso; Variable selection.

MeSH terms

  • Animals
  • Bayes Theorem
  • Communicable Diseases* / transmission
  • Humans
  • Models, Theoretical*