K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data

Hussah N AlEisa; Safwat Hamad; Ahmed Elhadad

doi:10.1155/2022/8077664

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data

Comput Intell Neurosci. 2022 Jul 14:2022:8077664. doi: 10.1155/2022/8077664. eCollection 2022.

Authors

Hussah N AlEisa¹, Safwat Hamad², Ahmed Elhadad³

Affiliations

¹ Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
² Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
³ Department of Computer Science, Faculty of Computers and Information, South Valley University, Qena, Egypt.

Abstract

In the mid-1970s, the first-generation sequencing technique (Sanger) was created. It used Advanced BioSystems sequencing devices and Beckman's GeXP genetic testing technology. The second-generation sequencing (2GS) technique arrived just several years after the first human genome was published in 2003. 2GS devices are very quicker than Sanger sequencing equipment, with considerably cheaper manufacturing costs and far higher throughput in the form of short reads. The third-generation sequencing (3GS) method, initially introduced in 2005, offers further reduced manufacturing costs and higher throughput. Even though sequencing technique has result generations, it is error-prone due to a large number of reads. The study of this massive amount of data will aid in the decoding of life secrets, the detection of infections, the development of improved crops, and the improvement of life quality, among other things. This is a challenging task, which is complicated not just by a large number of reads and by the occurrence of sequencing mistakes. As a result, error correction is a crucial duty in data processing; it entails identifying and correcting read errors. Various k-spectrum-based error correction algorithms' performance can be influenced by a variety of characteristics like coverage depth, read length, and genome size, as demonstrated in this work. As a result, time and effort must be put into selecting acceptable approaches for error correction of certain NGS data.

MeSH terms

Algorithms*
High-Throughput Nucleotide Sequencing* / methods
Humans
Sequence Analysis, DNA