Metagenome assembly through clustering of next-generation sequencing data using protein sequences

J Microbiol Methods. 2015 Feb:109:180-7. doi: 10.1016/j.mimet.2015.01.002. Epub 2015 Jan 6.

Abstract

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising.

Keywords: Metagenome assembly; Next-generation sequencing; Protein sequences; Sequence clustering.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis*
  • Computational Biology / methods
  • Environmental Microbiology
  • Metagenomics / methods*
  • Proteins / genetics*

Substances

  • Proteins