Hierarchical Structured Component Analysis for Microbiome Data Using Taxonomy Assignments

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1302-1312. doi: 10.1109/TCBB.2020.3039326. Epub 2022 Jun 3.

Abstract

The recent advent of high-throughput sequencing technology has enabled us to study the associations between human microbiome and diseases. The DNA sequences of microbiome samples are clustered as operational taxonomic units (OTUs) according to their similarity. The OTU table containing counts of OTUs present in each sample is used to measure correlations between OTUs and disease status and find key microbes for prediction of the disease status. Various statistical methods have been proposed for such microbiome data analysis. However, none of these methods reflects the hierarchy of taxonomy information. In this paper, we propose a hierarchical structural component model for microbiome data (HisCoM-microb) using taxonomy information as well as OTU table data. The proposed HisCoM-microb consists of two layers: one for OTUs and the other for taxa at the higher taxonomy level. Then we calculate simultaneously coefficient estimates of OTUs and taxa of the two layers inserted in the hierarchical model. Through this analysis, we can infer the association between taxa or OTUs and disease status, considering the impact of taxonomic structure on disease status. Both simulation study and real microbiome data analysis show that HisCoM-microb can successfully reveal the relations between each taxon and disease status and identify the key OTUs of the disease at the same time.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Computer Simulation
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Microbiota* / genetics
  • Phylogeny