DOCEST-fast and accurate estimator of human NGS sequencing depth and error rate

Bioinform Adv. 2023 Jul 18;3(1):vbad084. doi: 10.1093/bioadv/vbad084. eCollection 2023.

Abstract

Motivation: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation.

Results: Here, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate.

Availability and implementation: DOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.