Exploration of noncoding sequences in metagenomes

PLoS One. 2013;8(3):e59488. doi: 10.1371/journal.pone.0059488. Epub 2013 Mar 25.

Abstract

Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods
  • Environmental Microbiology*
  • Genome, Bacterial
  • Humans
  • Metagenome*
  • Metagenomics*
  • Molecular Sequence Annotation
  • Open Reading Frames
  • RNA, Untranslated

Substances

  • RNA, Untranslated

Grants and funding

The authors' study was supported by Departamento Administrativo de Ciencia, Tecnología e Innovación – COLCIENCIAS from the Republic of COLOMBIA, Project No 6570-392-19990 for GeBix (Colombian Center for Genomics and Bioinformatics of Extreme Environments). Fabian Tobar was also the recipient of a student fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.