yacrd and fpa: upstream tools for long-read genome assembly

Bioinformatics. 2020 Jun 1;36(12):3894-3896. doi: 10.1093/bioinformatics/btaa262.

Abstract

Motivation: Genome assembly is increasingly performed on long, uncorrected reads. Assembly quality may be degraded due to unfiltered chimeric reads; also, the storage of all read overlaps can take up to terabytes of disk space.

Results: We introduce two tools: yacrd for chimera removal and read scrubbing, and fpa for filtering out spurious overlaps. We show that yacrd results in higher-quality assemblies and is one hundred times faster than the best available alternative.

Availability and implementation: https://github.com/natir/yacrd and https://github.com/natir/fpa.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing*
  • Sequence Analysis, DNA
  • Software*