Generalized linear models with linear constraints for microbiome compositional data

Biometrics. 2019 Mar;75(1):235-244. doi: 10.1111/biom.12956. Epub 2018 Aug 10.

Abstract

Motivated by regression analysis for microbiome compositional data, this article considers generalized linear regression analysis with compositional covariates, where a group of linear constraints on regression coefficients are imposed to account for the compositional nature of the data and to achieve subcompositional coherence. A penalized likelihood estimation procedure using a generalized accelerated proximal gradient method is developed to efficiently estimate the regression coefficients. A de-biased procedure is developed to obtain asymptotically unbiased and normally distributed estimates, which leads to valid confidence intervals of the regression coefficients. Simulations results show the correctness of the coverage probability of the confidence intervals and smaller variances of the estimates when the appropriate linear constraints are imposed. The methods are illustrated by a microbiome study in order to identify bacterial species that are associated with inflammatory bowel disease (IBD) and to predict IBD using fecal microbiome.

Keywords: Accelerated proximal gradient; De-biased estimation; High dimensional data; Metagenomics; Penalized estimation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bacteria / isolation & purification*
  • Computer Simulation
  • Confidence Intervals
  • Feces / microbiology
  • Humans
  • Inflammatory Bowel Diseases / microbiology
  • Likelihood Functions
  • Linear Models*
  • Microbiota*
  • Regression Analysis