Survey of gene splicing algorithms based on reads

Xiuhua Si; Qian Wang; Lei Zhang; Ruo Wu; Jiquan Ma

doi:10.1080/21655979.2017.1373538

Survey of gene splicing algorithms based on reads

Bioengineered. 2017 Nov 2;8(6):750-758. doi: 10.1080/21655979.2017.1373538. Epub 2017 Sep 21.

Authors

Xiuhua Si¹, Qian Wang², Lei Zhang¹, Ruo Wu¹, Jiquan Ma¹

Affiliations

¹ a Department of Computer Science & Technology , Heilongjiang University , Harbin , China.
² b Shandong Aerospace Institute of Electronic Technology , Yantai , China.

Abstract

Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.

Keywords: De Bruijn graph; Hadoop; MapReduce; gene splicing; read.

MeSH terms

Algorithms*
Genome, Bacterial
High-Throughput Nucleotide Sequencing
Sequence Analysis, DNA / methods*
Software