WGA-LP: a pipeline for whole genome assembly of contaminated reads

Bioinformatics. 2022 Jan 12;38(3):846-848. doi: 10.1093/bioinformatics/btab719.

Abstract

Summary: Whole genome assembly (WGA) of bacterial genomes with short reads is a quite common task as DNA sequencing has become cheaper with the advances of its technology. The process of assembling a genome has no absolute golden standard and it requires to perform a sequence of steps each of which can involve combinations of many different tools. However, the quality of the final assembly is always strongly related to the quality of the input data. With this in mind we built WGA-LP, a package that connects state-of-the-art programs for microbial analysis and novel scripts to check and improve the quality of both samples and resulting assemblies. WGA-LP, with its conservative decontamination approach, has shown to be capable of creating high quality assemblies even in the case of contaminated reads.

Availability and implementation: WGA-LP is available on GitHub (https://github.com/redsnic/WGA-LP) and Docker Hub (https://hub.docker.com/r/redsnic/wgalp). The web app for node visualization is hosted by shinyapps.io (https://redsnic.shinyapps.io/ContigCoverageVisualizer/).

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome, Bacterial
  • High-Throughput Nucleotide Sequencing* / methods
  • Sequence Analysis, DNA / methods
  • Software*