Zero-Inflated Beta Regression for Differential Abundance Analysis with Metagenomics Data

J Comput Biol. 2016 Feb;23(2):102-110. doi: 10.1089/cmb.2015.0157. Epub 2015 Dec 16.

Abstract

Metagenomics data have been growing rapidly due to the advances in NGS technologies. One goal of human microbial studies is to detect abundance differences across clinical conditions. Besides small sample size and high dimension, metagenomics data are usually represented as compositions (proportions) with a large number of zeros and skewed distribution. Efficient tools for handling such compositional data need to be developed. We propose a zero-inflated beta regression approach (ZIBSeq) for identifying differentially abundant features between multiple clinical conditions. The proposed method takes the sparse nature of metagenomics data into account and handle the compositional data efficiently. Compared with other available methods, the proposed approach demonstrates better performance with large AUC values for most simulation studies. When applied to a human metagenomics data, it also identifies biologically important taxa reported from previous studies. The software in R is available upon request from the first author.

Keywords: algorithms; graphs and networks; machine learning; metagenomics; statistical models.