Stacking up RADSeq assembly programs: From complete hit to completely abysmal

Mol Ecol Resour. 2020 Mar;20(2):357-359. doi: 10.1111/1755-0998.13140. Epub 2020 Feb 20.

Abstract

Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome-wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction-site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high-performing assembler is infrequently used in their literature survey (CD-HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs-advice that molecular ecologists working to assemble sequences of all kinds should take to heart.

Keywords: bioinfomatics/phyloinfomatics; de novo assembly; genomics/proteomics; genotyping by sequencing; reduced representation library; restriction-site associated DNA sequencing.

MeSH terms

  • Gene Library
  • Genome
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*