A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays

Chia-Ju Lee; Dong Fu; Pan Du; Hongmei Jiang; Simon M Lin; Warren Kibbe

doi:10.1504/ijcbdd.2008.022209

A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays

Int J Comput Biol Drug Des. 2008;1(4):396-405. doi: 10.1504/ijcbdd.2008.022209.

Authors

Chia-Ju Lee¹, Dong Fu, Pan Du, Hongmei Jiang, Simon M Lin, Warren Kibbe

Affiliation

¹ Computational Biology and Bioinformatics Program, Northwestemrn University, Evanston, IL 60208, USA. ChiaJuLee2008@u.northwestern.edu

PMID: 20063464
DOI: 10.1504/ijcbdd.2008.022209

Abstract

Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.

MeSH terms

Biostatistics / methods*
Computational Biology / methods*
Computer Storage Devices
Computers
Equipment Design
Humans
Oligonucleotide Array Sequence Analysis / methods*
Problem Solving