Statistical Methods for Methylation Data

Methods Mol Biol. 2017:1589:185-203. doi: 10.1007/7651_2015_316.

Abstract

Methylation data are continuous variables with most values in a sample lying in a narrow range. In a research project they can either be the outcome, or a variable potentially explaining some of the variation in other outcomes. A range of statistical methods are appropriate depending on the experimental questions. Before the formal analysis is carried out, it is important that data are checked and cleaned. Where batch effects may be present, this should be accounted for in the analysis. Where many methylation sites are investigated in a study, attention should be given to multiple comparisons and false discovery rates, and multivariate methods such as principal component analysis may be useful.

Keywords: Batch effects; Linear model; Principal component analysis; Regression; Statistical power.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Methylation*
  • Data Interpretation, Statistical*
  • Gene Expression Profiling
  • Humans
  • Models, Statistical*
  • Principal Component Analysis / methods*