A simple statistical test of taxonomic or functional homogeneity using replicated microbiome sequencing samples

J Biotechnol. 2017 May 20:250:45-50. doi: 10.1016/j.jbiotec.2016.10.020. Epub 2016 Oct 27.

Abstract

One important question in microbiome analysis is how to assess the homogeneity of the microbial composition in a given environment, with respect to a given analysis method. Do different microbial samples taken from the same environment follow the same taxonomic distribution of organisms, or the same distribution of functions? Here we provide a non-parametric statistical "triangulation test" to address this type of question. The test requires that multiple replicates are available for each of the biological samples, and it is based on three-way computational comparisons of samples. To illustrate the application of the test, we collected three biological samples taken from different locations in one piece of human stool, each represented by three replicates, and analyzed them using MEGAN. (Despite its name, the triangulation test does not require that the number of biological samples or replicates be three.) The triangulation test rejects the null hypothesis that the three biological samples exhibit the same distribution of taxa or function (error probability ≤0.05), indicating that the microbial composition of the investigated human stool is not homogenous on a macroscopic scale, suggesting that pooling material from multiple locations is a reasonable practice. We provide an implementation of the test in our open source program MEGAN Community Edition.

Keywords: Environmental inhomogeneity; Functional diversity; Metagenomics; Statistical testing; Taxonomic composition.

MeSH terms

  • Algorithms*
  • Bacteria / classification
  • Bacteria / genetics*
  • Bacteria / isolation & purification*
  • Bacterial Typing Techniques / methods*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Feces / microbiology
  • High-Throughput Screening Assays / methods
  • Humans
  • Microbiota / genetics*
  • Models, Statistical
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*