Analysis of boutique arrays: a universal method for the selection of the optimal data normalization procedure

Int J Mol Med. 2013 Sep;32(3):668-84. doi: 10.3892/ijmm.2013.1443. Epub 2013 Jul 15.

Abstract

DNA microarrays, which are among the most popular genomic tools, are widely applied in biology and medicine. Boutique arrays, which are small, spotted, dedicated microarrays, constitute an inexpensive alternative to whole-genome screening methods. The data extracted from each microarray-based experiment must be transformed and processed prior to further analysis to eliminate any technical bias. The normalization of the data is the most crucial step of microarray data pre-processing and this process must be carefully considered as it has a profound effect on the results of the analysis. Several normalization algorithms have been developed and implemented in data analysis software packages. However, most of these methods were designed for whole-genome analysis. In this study, we tested 13 normalization strategies (ten for double-channel data and three for single-channel data) available on R Bioconductor and compared their effectiveness in the normalization of four boutique array datasets. The results revealed that boutique arrays can be successfully normalized using standard methods, but not every method is suitable for each dataset. We also suggest a universal seven-step workflow that can be applied for the selection of the optimal normalization procedure for any boutique array dataset. The described workflow enables the evaluation of the investigated normalization methods based on the bias and variance values for the control probes, a differential expression analysis and a receiver operating characteristic curve analysis. The analysis of each component results in a separate ranking of the normalization methods. A combination of the ranks obtained from all the normalization procedures facilitates the selection of the most appropriate normalization method for the studied dataset and determines which methods can be used interchangeably.

MeSH terms

  • Animals
  • Asthma / genetics
  • Computational Biology / methods
  • Data Interpretation, Statistical
  • Genomics / methods
  • Humans
  • Hypersensitivity / genetics
  • Leukemia, Myeloid, Acute / genetics
  • Mice
  • Oligonucleotide Array Sequence Analysis / methods*