Recent advances in sequence assembly: principles and applications

Brief Funct Genomics. 2017 Nov 1;16(6):361-378. doi: 10.1093/bfgp/elx006.

Abstract

The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.

Keywords: DNA assembly; de Bruijn graph; fragment; k- mer; repeat.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Computer Graphics
  • DNA / genetics*
  • Metagenomics
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA / methods*
  • Software

Substances

  • DNA