Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis

IEEE J Biomed Health Inform. 2017 Jan;21(1):238-245. doi: 10.1109/JBHI.2015.2496323. Epub 2015 Nov 3.

Abstract

As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of large-scale cancer genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale distributed parallel (CSDP) separable 1-D wavelet decomposition technique for denoising through differential expression thresholding and classification of LSCG microarray data. This research presents a novel methodology that utilizes a CSDP separable 1-D method for wavelet-based transformation in order to initialize a threshold which will retain significantly expressed genes through the denoising process for robust classification of cancer patients. Additionally, the overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding for denoising was used for the classification of samples within the Global Cancer Map, Cancer Cell Line Encyclopedia, and The Cancer Genome Atlas. The results proved that separable 1-D parallel distributed wavelet denoising in the cloud and differential expression thresholding increased the computational performance and enabled the generation of higher quality LSCG microarray datasets, which led to more accurate classification results.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cell Line, Tumor
  • Cloud Computing
  • Databases, Genetic
  • Genomics / methods*
  • Humans
  • Neoplasms / genetics*
  • Neoplasms / metabolism
  • Oligonucleotide Array Sequence Analysis / methods*
  • Signal Processing, Computer-Assisted*