k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets

David Ainsworth; Michael J E Sternberg; Come Raczy; Sarah A Butcher

doi:10.1093/nar/gkw1248

k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets

Nucleic Acids Res. 2017 Feb 28;45(4):1649-1656. doi: 10.1093/nar/gkw1248.

Authors

David Ainsworth¹, Michael J E Sternberg¹, Come Raczy², Sarah A Butcher³

Affiliations

¹ Centre for Integrative Systems Biology and Bioinformatics, Division of Molecular Biosciences, Faculty of Natural Sciences, Imperial College London, SW7 2AZ, London, UK.
² Illumina Inc., 5200 Illumina Way, San Diego, CA 92122, USA.
³ Bioinformatics Data Science Group, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, SW7 2AZ, London, UK.

Abstract

k-SLAM is a highly efficient algorithm for the characterization of metagenomic data. Unlike other ultra-fast metagenomic classifiers, full sequence alignment is performed allowing for gene identification and variant calling in addition to accurate taxonomic classification. A k-mer based method provides greater taxonomic accuracy than other classifiers and a three orders of magnitude speed increase over alignment based approaches. The use of alignments to find variants and genes along with their taxonomic origins enables novel strains to be characterized. k-SLAM's speed allows a full taxonomic classification and gene identification to be tractable on modern large data sets. A pseudo-assembly method is used to increase classification accuracy by up to 40% for species which have high sequence homology within their genus.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Case-Control Studies
Computational Biology / methods*
Computational Biology / standards
DNA Barcoding, Taxonomic / methods*
DNA Barcoding, Taxonomic / standards
Gastrointestinal Microbiome
Genome, Bacterial
Humans
Liver Cirrhosis / microbiology
Metagenome*
Metagenomics / methods*
Metagenomics / standards
Reproducibility of Results
Shiga-Toxigenic Escherichia coli / classification
Shiga-Toxigenic Escherichia coli / genetics

Abstract

Publication types

MeSH terms

Grants and funding