gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

PLoS One. 2019 Sep 9;14(9):e0216885. doi: 10.1371/journal.pone.0216885. eCollection 2019.

Abstract

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Bacteria / genetics
  • Benchmarking
  • Contig Mapping / statistics & numerical data*
  • Databases, Genetic
  • Genome*
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • High-Throughput Nucleotide Sequencing
  • Seals, Earless / genetics
  • Sequence Analysis, DNA / statistics & numerical data*
  • Software*

Grants and funding

This work was supported by the Saimaa Ringed Seal Genome Project (SRSGP) research grants from the Jane and Aatos Erkko Foundation [4-2013 and 5-2017 to J.J. & P.A] (https://jaes.fi/en/). This work was also supported by the Helsinki University Integrated Life Sciences doctoral programme (ILS) [3-2016 to J.I.K.] (https://www.helsinki.fi/en/research/doctoral-education/doctoral-schools-and-programmes/doctoral-school-in-health-sciences/doctoral-programme-in-integrative-life-science). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.