An Efficient Approach to Screening Epigenome-Wide Data

Biomed Res Int. 2016:2016:2615348. doi: 10.1155/2016/2615348. Epub 2016 Mar 13.

Abstract

Screening cytosine-phosphate-guanine dinucleotide (CpG) DNA methylation sites in association with some covariate(s) is desired due to high dimensionality. We incorporate surrogate variable analyses (SVAs) into (ordinary or robust) linear regressions and utilize training and testing samples for nested validation to screen CpG sites. SVA is to account for variations in the methylation not explained by the specified covariate(s) and adjust for confounding effects. To make it easier to users, this screening method is built into a user-friendly R package, ttScreening, with efficient algorithms implemented. Various simulations were implemented to examine the robustness and sensitivity of the method compared to the classical approaches controlling for multiple testing: the false discovery rates-based (FDR-based) and the Bonferroni-based methods. The proposed approach in general performs better and has the potential to control both types I and II errors. We applied ttScreening to 383,998 CpG sites in association with maternal smoking, one of the leading factors for cancer risk.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computational Biology
  • CpG Islands / genetics*
  • DNA Methylation / genetics*
  • Epigenomics / statistics & numerical data*
  • Genome, Human
  • Humans
  • Linear Models
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Risk Factors