fastCCLasso: a fast and efficient algorithm for estimating correlation matrix from compositional data

Bioinformatics. 2024 May 2;40(5):btae314. doi: 10.1093/bioinformatics/btae314.

Abstract

Motivation: The composition and structure of microbial communities on the body surface are closely related to human health. The interaction relationship among microbes can help us understand the formation of the microecological environment and the biological mechanism by which microorganisms influence host health. With the help of high-throughput sequencing technologies, microbial abundances in a natural environment can be directly measured without the isolation of microorganisms in culture. Sequencing experiments in microbiome studies can measure the relative abundance of microbes, which is called compositional data. Although there are already many methods for correlation analysis for compositional data, the computation time or accuracy still needs to be improved for current microbiome studies.

Results: We develop a fast and efficient algorithm, called fastCCLasso, based on a penalized weighted least squares for inferring the correlation structure of microbes from compositional data in microbiome studies. We perform a large number of numerical experiments and the simulation results show that fastCCLasso outperforms its competitors in edge detection for inferring the correlation network. We also apply fastCCLasso for estimating microbial networks in microbiome studies and fastCCLasso provides a conservative network with comparable false discovery counts that are derived from shuffled data.

Availability and implementation: FastCCLasso is open source and freely available from https://github.com/ShenZhang-Statistics/fastCCLasso under GNU LGPL v3.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Microbiota*
  • Software