SCA-NGS: Secure compression algorithm for next generation sequencing data using genetic operators and block sorting

Sci Prog. 2021 Apr-Jun;104(2):368504211023276. doi: 10.1177/00368504211023276.

Abstract

Recent advancements in sequencing methods have led to significant increase in sequencing data. Increase in sequencing data leads to research challenges such as storage, transfer, processing, etc. data compression techniques have been opted to cope with the storage of these data. There have been good achievements in compression ratio and execution time. This fast-paced advancement has raised major concerns about the security of data. Confidentiality, integrity, authenticity of data needs to be ensured. This paper presents a novel lossless reference-free algorithm that focuses on data compression along with encryption to achieve security in addition to other parameters. The proposed algorithm uses preprocessing of data before applying general-purpose compression library. Genetic algorithm is used to encrypt the data. The technique is validated with experimental results on benchmark datasets. Comparative analysis with state-of-the-art techniques is presented. The results show that the proposed method achieves better results in comparison to existing methods.

Keywords: NGS data; data compression; encryption; genetic algorithm.

MeSH terms

  • Algorithms
  • Confidentiality
  • Data Compression* / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Operator Regions, Genetic