Recent advances in sequence assembly: principles and applications

Qingfeng Chen; Chaowang Lan; Liang Zhao; Jianxin Wang; Baoshan Chen; Yi-Ping Phoebe Chen

doi:10.1093/bfgp/elx006

Recent advances in sequence assembly: principles and applications

Brief Funct Genomics. 2017 Nov 1;16(6):361-378. doi: 10.1093/bfgp/elx006.

Authors

Qingfeng Chen, Chaowang Lan, Liang Zhao, Jianxin Wang, Baoshan Chen, Yi-Ping Phoebe Chen

PMID: 28453648
DOI: 10.1093/bfgp/elx006

Abstract

The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.

Keywords: DNA assembly; de Bruijn graph; fragment; k- mer; repeat.

Publication types

Review

MeSH terms

Algorithms
Computer Graphics
DNA / genetics*
Metagenomics
Polymorphism, Single Nucleotide
Sequence Analysis, DNA / methods*
Software

Substances

DNA