Analyzing differences between microbiome communities using mixture distributions

Stat Med. 2018 Nov 30;37(27):4036-4053. doi: 10.1002/sim.7896. Epub 2018 Jul 23.

Abstract

In this paper, we present a method to assess differences between microbiome communities that effectively models sparse count data and accounts for presence-absence bias frequently encountered when zeros are present. We assume that the observed data for each operational taxonomic unit is Poisson generated with the rate for each sample originating from an underlying rate distribution. We propose to model this distribution using a mixture model, specifying the components based on the posterior rate distribution of a count and estimating the optimal weights using a least squares objective function. The distribution incorporates varying resolutions of samples, a point mass for differentiating structural and nonstructural zeros, and a truncation point mass to account for high values that are too sparse to model. As mixture component specification is not always straightforward, a method to estimate a joint model from several mixture distributions using minimum distances of bootstrap iterates is proposed. Once the population rate distribution is approximated, we obtain sample-specific distributions by conditioning on the observed operational taxonomic unit count, resolution, and estimated mixture distribution and then use these to estimate pairwise distances for a permutation test. The method gives an accurate estimate of the true proportion of zeros for presence-absence, effectively models the distribution of low counts using the mixture distribution, and achieves good power for detecting differences in a variety of scenarios. The method is tested using a simulation study and applied to two microbiome datasets. In each case, the results are compared with a number of existing methods.

Keywords: microbiome; mixture models; sparsity; statistical ecology; zero inflation.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / classification*
  • Bias
  • Humans
  • Least-Squares Analysis
  • Microbiota*
  • Models, Statistical
  • Poisson Distribution
  • Statistics as Topic*

Grants and funding