A web application for sample size and power calculation in case-control microbiome studies

Federico Mattiello; Bie Verbist; Karoline Faust; Jeroen Raes; William D Shannon; Luc Bijnens; Olivier Thas

doi:10.1093/bioinformatics/btw099

A web application for sample size and power calculation in case-control microbiome studies

Bioinformatics. 2016 Jul 1;32(13):2038-40. doi: 10.1093/bioinformatics/btw099. Epub 2016 Feb 19.

Authors

Federico Mattiello¹, Bie Verbist², Karoline Faust³, Jeroen Raes³, William D Shannon⁴, Luc Bijnens², Olivier Thas⁵

Affiliations

¹ Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000.
² Janssen Pharmaceutica, Turnhoutseweg 30, Beerse, 2340, Belgium.
³ KU Leuven, Laboratory of Molecular Bacteriology and Department of Microbiology and Immunology, Herestraat 49, Leuven, 3000, Belgium.
⁴ BioRankings, 4041 Forest Park Ave, St.Louis, MO 63108, USA.
⁵ Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000 University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, Australia.

PMID: 27153704
DOI: 10.1093/bioinformatics/btw099

Abstract

: When designing a case-control study to investigate differences in microbial composition, it is fundamental to assess the sample sizes needed to detect an hypothesized difference with sufficient statistical power. Our application includes power calculation for (i) a recoded version of the two-sample generalized Wald test of the 'HMP' R-package for comparing community composition, and (ii) the Wilcoxon-Mann-Whitney test for comparing operational taxonomic unit-specific abundances between two samples (optional). The simulation-based power calculations make use of the Dirichlet-Multinomial model to describe and generate abundances. The web interface allows for easy specification of sample and effect sizes. As an illustration of our application, we compared the statistical power of the two tests, with and without stratification of samples. We observed that statistical power increases considerably when stratification is employed, meaning that less samples are needed to detect the same effect size with the same power.

Availability and implementation: The web interface is written in R code using Shiny (RStudio Inc., 2016) and it is available at https://fedematt.shinyapps.io/shinyMB The R code for the recoded generalized Wald test can be found at https://github.com/mafed/msWaldHMP CONTACT: Federico.Mattiello@UGent.be.

MeSH terms

Case-Control Studies
Computational Biology / methods*
Humans
Internet
Microbiota*
Models, Theoretical
Sample Size
Software*