Parallel and private generalized suffix tree construction and query on genomic data

BMC Genom Data. 2022 Jun 17;23(1):45. doi: 10.1186/s12863-022-01053-x.

Abstract

Background: Several technological advancements and digitization of healthcare data have provided the scientific community with a large quantity of genomic data. Such datasets facilitated a deeper understanding of several diseases and our health in general. Strikingly, these genome datasets require a large storage volume and present technical challenges in retrieving meaningful information. Furthermore, the privacy aspects of genomic data limit access and often hinder timely scientific discovery.

Methods: In this paper, we utilize the Generalized Suffix Tree (GST); their construction and applications have been fairly studied in related areas. The main contribution of this article is the proposal of a privacy-preserving string query execution framework using GSTs and an additional tree-based hashing mechanism. Initially, we start by introducing an efficient GST construction in parallel that is scalable for a large genomic dataset. The secure indexing scheme allows the genomic data in a GST to be outsourced to an untrusted cloud server under encryption. Additionally, the proposed methods can perform several string search operations (i.e., exact, set-maximal matches) securely and efficiently using the outlined framework.

Results: The experimental results on different datasets and parameters in a real cloud environment exhibit the scalability of these methods as they also outperform the state-of-the-art method based on Burrows-Wheeler Transformation (BWT). The proposed method only takes around 36.7s to execute a set-maximal match whereas the BWT-based method takes around 160.85s, providing a 4× speedup.

Keywords: Outsourcing Genomic Data on Cloud; Parallel Construction of Generalized Suffix Tree; Privacy-preserving Queries on Genomic Data; Reverse Merkle Tree.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cloud Computing*
  • Computer Security
  • Genomics
  • Outsourced Services*
  • Privacy