JASPER: A fast genome polishing tool that improves accuracy of genome assemblies

Alina Guo; Steven L Salzberg; Aleksey V Zimin

doi:10.1371/journal.pcbi.1011032

JASPER: A fast genome polishing tool that improves accuracy of genome assemblies

PLoS Comput Biol. 2023 Mar 31;19(3):e1011032. doi: 10.1371/journal.pcbi.1011032. eCollection 2023 Mar.

Authors

Alina Guo^{1

2}, Steven L Salzberg^{1

3

4

5}, Aleksey V Zimin^{1

3}

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
² Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America.
³ Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America.
⁴ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America.
⁵ Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America.

Abstract

Advances in long-read sequencing technologies have dramatically improved the contiguity and completeness of genome assemblies. Using the latest nanopore-based sequencers, we can generate enough data for the assembly of a human genome from a single flow cell. With the long-read data from these sequences, we can now routinely produce de novo genome assemblies in which half or more of a genome is contained in megabase-scale contigs. Assemblies produced from nanopore data alone, though, have relatively high error rates and can benefit from a process called polishing, in which more-accurate reads are used to correct errors in the consensus sequence. In this manuscript, we present a novel tool for genome polishing called JASPER (Jellyfish-based Assembly Sequence Polisher for Error Reduction). In contrast to many other polishing methods, JASPER gains efficiency by avoiding the alignment of reads to the assembly. Instead, JASPER uses a database of k-mer counts that it creates from the reads to detect and correct errors in the consensus. Our experiments demonstrate that JASPER is faster than alignment-based polishers, and both faster and more accurate than other k-mer based polishing methods. We also introduce the idea of using a polishing tool to create population-specific reference genomes, and illustrate this idea using sequence data from multiple individuals from Tokyo, Japan.

Copyright: © 2023 Guo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Genome, Human / genetics
High-Throughput Nucleotide Sequencing*
Humans
Metagenomics
Nanopores*
Sequence Analysis, DNA

Abstract

Publication types

MeSH terms

Grants and funding