Comparison of pre-processing methods for Infinium HumanMethylation450 BeadChip array

Bioinformatics. 2017 Oct 15;33(20):3151-3157. doi: 10.1093/bioinformatics/btx372.

Abstract

Motivation: Microarrays are widely used to quantify DNA methylation because they are economical, require only small quantities of input DNA and focus on well-characterized regions of the genome. However, pre-processing of methylation microarray data is challenging because of confounding factors that include background fluorescence, dye bias and the impact of germline polymorphisms. Therefore, we present valuable insights and a framework for those seeking the most optimal pre-processing method through a data-driven approach.

Results: Here, we show that Dasen is the optimal pre-processing methodology for the Infinium HumanMethylation450 BeadChip array in prostate cancer, a frequently employed platform for tumour methylome profiling in both the TCGA and ICGC consortia. We evaluated the impact of 11 pre-processing methods on batch effects, replicate variabilities, sensitivities and sample-to-sample correlations across 809 independent prostate cancer samples, including 150 reported for the first time in this study. Overall, Dasen is the most effective for removing artefacts and detecting biological differences associated with tumour aggressivity. Relative to the raw dataset, it shows a reduction in replicate variances of 67% and 76% for β- and M-values, respectively. Our study provides a unique pre-processing benchmark for the community with an emphasis on biological implications.

Availability and implementation: All software used in this study are publicly available as detailed in the article.

Contact: paul.boutros@oicr.on.ca.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Comparative Study

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma / metabolism
  • CpG Islands*
  • DNA Methylation*
  • Genome, Human
  • Genomics / methods
  • Humans
  • Male
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Genetic
  • Prostatic Neoplasms / genetics
  • Prostatic Neoplasms / metabolism
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • Software*