PEP_scaffolder: using (homologous) proteins to scaffold genomes

Bioinformatics. 2016 Oct 15;32(20):3193-3195. doi: 10.1093/bioinformatics/btw378. Epub 2016 Jun 22.

Abstract

Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions.

Results: We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ∼97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy.

Availability and implementation: The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ CONTACT: lijt@cafs.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Animals
  • Genome*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA*
  • Sequence Homology
  • Software