An adaptive direction-assisted test for microbiome compositional data

Bioinformatics. 2022 Jul 11;38(14):3493-3500. doi: 10.1093/bioinformatics/btac361.

Abstract

Motivation: Microbial communities have been shown to be associated with many complex diseases, such as cancers and cardiovascular diseases. The identification of differentially abundant taxa is clinically important. It can help understand the pathology of complex diseases, and potentially provide preventive and therapeutic strategies. Appropriate differential analyses for microbiome data are challenging due to its unique data characteristics including compositional constraint, excessive zeros and high dimensionality. Most existing approaches either ignore these data characteristics or only account for the compositional constraint by using log-ratio transformations with zero observations replaced by a pseudocount. However, there is no consensus on how to choose a pseudocount. More importantly, ignoring the characteristic of excessive zeros may result in poorly powered analyses and therefore yield misleading findings.

Results: We develop a novel microbiome-based direction-assisted test for the detection of overall difference in microbial relative abundances between two health conditions, which simultaneously incorporates the characteristics of relative abundance data. The proposed test (i) divides the taxa into two clusters by the directions of mean differences of relative abundances and then combines them at cluster level, in light of the compositional characteristic; and (ii) contains a burden type test, which collapses multiple taxa into a single one to account for excessive zeros. Moreover, the proposed test is an adaptive procedure, which can accommodate high-dimensional settings and yield high power against various alternative hypotheses. We perform extensive simulation studies across a wide range of scenarios to evaluate the proposed test and show its substantial power gain over some existing tests. The superiority of the proposed approach is further demonstrated with real datasets from two microbiome studies.

Availability and implementation: An R package for MiDAT is available at https://github.com/zhangwei0125/MiDAT.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Microbiota*