The metagenomic telescope

PLoS One. 2014 Jul 23;9(7):e101605. doi: 10.1371/journal.pone.0101605. eCollection 2014.

Abstract

Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Archaea / classification
  • Archaea / genetics
  • Archaea / metabolism
  • Archaeal Proteins / genetics
  • Archaeal Proteins / metabolism
  • Artificial Intelligence*
  • Bacteria / classification
  • Bacteria / genetics
  • Bacteria / metabolism
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • Cluster Analysis
  • Computational Biology / methods*
  • DNA Repair / genetics
  • Enzymes / genetics
  • Enzymes / metabolism
  • Humans
  • Markov Chains
  • Metagenome / genetics*
  • Metagenomics / methods*
  • Proteomics / methods
  • Reproducibility of Results

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Enzymes

Grants and funding

This work was supported by the Hungarian Scientific Research Fund (OTKA NK 84008, K109486), the Baross program of the New Hungary Development Plan (3DSTRUCT, OMFB-00266/2010 REG-KM-09-1-2009-0050), the Hungarian Academy of Sciences (TTK IF-28/2012), and the European Commission FP7 Biostruct-X project (contract number 283570). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Co-author Vince Grolmusz is employed by Uratim Ltd. Uratim Ltd provided support in the form of salary for author Vince Grolmusz, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of the author is articulated in the ‘author contributions’ section.