Bayesian inference on quasi-sparse count data

Biometrika. 2016 Dec;103(4):971-983. doi: 10.1093/biomet/asw053. Epub 2016 Dec 8.

Abstract

There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.

Keywords: Count data; High-dimensional data; Local-global shrinkage; Rare variant; Shrinkage prior; Zero-inflation.