Zero is not absence: censoring-based differential abundance analysis for microbiome data

Bioinformatics. 2024 Feb 1;40(2):btae071. doi: 10.1093/bioinformatics/btae071.

Abstract

Motivation: Microbiome data analysis faces the challenge of sparsity, with many entries recorded as zeros. In differential abundance analysis, the presence of excessive zeros in data violates distributional assumptions and creates ties, leading to an increased risk of type I errors and reduced statistical power.

Results: We developed a novel normalization method, called censoring-based analysis of microbiome proportions (CAMP), for microbiome data by treating zeros as censored observations, transforming raw read counts into tie-free time-to-event-like data. This enables the use of survival analysis techniques, like the Cox proportional hazards model, for differential abundance analysis. Extensive simulations demonstrate that CAMP achieves proper type I error control and high power. Applying CAMP to a human gut microbiome dataset, we identify 60 new differentially abundant taxa across geographic locations, showcasing its usefulness. CAMP overcomes sparsity challenges, enabling improved statistical analysis and providing valuable insights into microbiome data in various contexts.

Availability and implementation: The R package is available at https://github.com/lapsumchan/CAMP.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Analysis
  • Gastrointestinal Microbiome*
  • Humans
  • Microbiota*
  • Research Design