Atria: an ultra-fast and accurate trimmer for adapter and quality trimming

GigaByte. 2021 Oct 15:2021:gigabyte31. doi: 10.46471/gigabyte.31. eCollection 2021.

Abstract

With advances in next-generation sequencing, adapters attached to reads and low-quality bases directly and implicitly hinder downstream analysis. For example, they can produce false-positive single nucleotide polymorphisms (SNP), and generate fragmented assemblies. There is a need for a fast trimming algorithm to remove adapters precisely, especially in read tails with relatively low quality. Here, we present Atria, a trimming program that matches the adapters in paired reads and finds possible overlapped regions using a fast and carefully designed byte-based matching algorithm (O (n) time with O (1) space). Atria also implements multi-threading in both sequence processing and file compression and supports single-end reads. Compared with other trimmers, Atria performs favorably in various trimming and runtime benchmarks of both simulated and real data. We also provide a fast and lightweight byte-based matching algorithm, which can be used in various short-sequence matching applications, such as primer search and seed scanning before alignment.

Grants and funding

This study was partially funded by the Interdepartmental funding of Genomics Research and Development Initiatives (GRDI), Canada to XL. The financial support of CFIA and University of Prince Edward Island to JC is greatly appreciated.