Comparison of dimensional reduction methods for detecting and visualizing novel patterns in human and marine microbiome

IEEE Trans Nanobioscience. 2013 Sep;12(3):199-205. doi: 10.1109/TNB.2013.2263287. Epub 2013 May 16.

Abstract

Using metagenomics to detect the global structure of microbial community remains a significant challenge. The structure of a microbial community and its functions are complicated because of not only the complex interactions among microbes but also their interactions with confounding environmental factors. Recently dimension reduction methods have been employed extensively to investigate the complex structure embedded in metagenomic profiles which summarize the abundance of functional or taxonomic categorizations in metagenomic studies. However, metagenomic profiles are not necessary to meet the "Assumption of Linearity" behind these methods. Therefore it is worth to investigate whether nonlinear methods are appropriate methods which can be utilized in metagenomic analysis. In this paper, we compare the applications of several methods, including two linear methods (Principle component analysis and nonnegative matrix factorization) and a nonlinear manifold learning method--Isomap on visualizing and analyzing metagenomic profiles. These methods are applied and compared on a taxonomic profile from 33 human gut metagenomes and a large-scale Pfam profile which are derived from 45 metagenomes in Global Ocean Sampling expedition. We find that all three methods can discover interesting structures of the taxonomic profile from human gut. Furthermore, Isomap identified a novel nonlinear structure of protein families. The relationships among the identified nonlinear components and environmental factors of global ocean are explored. The results indicate that nonlinear methods could be a complementary technique to current linear methods in analyzing metagenomic dataset.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Databases, Genetic
  • Feces / microbiology
  • Humans
  • Metagenome / genetics*
  • Metagenomics / methods*
  • Microbiota / genetics*
  • Nonlinear Dynamics
  • Oceans and Seas
  • Principal Component Analysis
  • Proteins / genetics
  • Proteins / metabolism
  • Water Microbiology

Substances

  • Proteins