Sufficient dimension reduction for compositional data

Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.

Abstract

Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.

Keywords: Count data; Penalized likelihood; Prediction; Regression; Sufficient statistic; visualization.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Likelihood Functions
  • Microbiota*