aKmerBroom: Ancient oral DNA decontamination using Bloom filters on k-mer sets

iScience. 2023 Sep 29;26(11):108057. doi: 10.1016/j.isci.2023.108057. eCollection 2023 Nov 17.

Abstract

Dental calculus samples are modeled as a mixture of DNA coming from dental plaque and contaminants. Current computational decontamination methods such as Recentrifuge and DeconSeq require either a reference database or sequenced negative controls, and therefore have limited use cases. We present a reference-free decontamination tool tailored for the removal of contaminant DNA of ancient oral sample called aKmerBroom. Our tool builds a Bloom filter of known ancient and modern oral k-mers, then scans an input set of ancient metagenomic reads using multiple passes to iteratively retain reads likely to be of oral origin. On synthetic data, aKmerBroom achieves over 89.53% sensitivity and 94.00% specificity. On real datasets, aKmerBroom shows higher read retainment (+60% on average) than other methods. We anticipate aKmerBroom will be a valuable tool for the processing of ancient oral samples as it will prevent contaminated datasets from being completely discarded in downstream analyses.

Keywords: Biocomputational method; Microbial genomics; Paleogenetics; Sequence analysis.