A zero inflated log-normal model for inference of sparse microbial association networks

PLoS Comput Biol. 2021 Jun 18;17(6):e1009089. doi: 10.1371/journal.pcbi.1009089. eCollection 2021 Jun.

Abstract

The advent of high-throughput metagenomic sequencing has prompted the development of efficient taxonomic profiling methods allowing to measure the presence, abundance and phylogeny of organisms in a wide range of environmental samples. Multivariate sequence-derived abundance data further has the potential to enable inference of ecological associations between microbial populations, but several technical issues need to be accounted for, like the compositional nature of the data, its extreme sparsity and overdispersion, as well as the frequent need to operate in under-determined regimes. The ecological network reconstruction problem is frequently cast into the paradigm of Gaussian Graphical Models (GGMs) for which efficient structure inference algorithms are available, like the graphical lasso and neighborhood selection. Unfortunately, GGMs or variants thereof can not properly account for the extremely sparse patterns occurring in real-world metagenomic taxonomic profiles. In particular, structural zeros (as opposed to sampling zeros) corresponding to true absences of biological signals fail to be properly handled by most statistical methods. We present here a zero-inflated log-normal graphical model (available at https://github.com/vincentprost/Zi-LN) specifically aimed at handling such "biological" zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of microbial association networks, with most notable gains obtained when analyzing taxonomic profiles displaying sparsity levels on par with real-world metagenomic datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Computer Simulation
  • Metagenome
  • Metagenomics / statistics & numerical data
  • Microbial Consortia / genetics
  • Microbial Consortia / physiology
  • Microbiota* / genetics
  • Microbiota* / physiology
  • Models, Biological*
  • Multivariate Analysis
  • Normal Distribution
  • Synthetic Biology

Grants and funding

V.P was supported by a Ph.D grant from CEA’s High Commissioner office (“Thèse Phare”). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript