Next-generation sequence assembly: four stages of data processing and computational challenges

Sara El-Metwally; Taher Hamza; Magdi Zakaria; Mohamed Helmy

doi:10.1371/journal.pcbi.1003345

Next-generation sequence assembly: four stages of data processing and computational challenges

PLoS Comput Biol. 2013;9(12):e1003345. doi: 10.1371/journal.pcbi.1003345. Epub 2013 Dec 12.

Authors

Sara El-Metwally¹, Taher Hamza¹, Magdi Zakaria¹, Mohamed Helmy²

Affiliations

¹ Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt.
² Botany Department, Faculty of Agriculture, Al-Azhar University, Cairo, Egypt ; Biotechnology Department, Faculty of Agriculture, Al-Azhar University, Cairo, Egypt.

Abstract

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Algorithms
Base Sequence
DNA / chemistry*
Genome
Sequence Alignment
Sequence Analysis, DNA / methods*
Software

Substances

DNA

Grants and funding

This work was supported by Japan Society for Promotion of Science (JSPS) Grants-in-Aid for Scientific Research [No. 236172] and the Egyptian Ministry of Higher Education, the Egyptian Bureau of Culture, Science and Education - Tokyo to MH; and Google Anita Borg Memorial Scholarship to SE The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.