A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-16-S2-S8. Epub 2015 Jan 21.

Abstract

Background: Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc.

Results: In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method.

Conclusions: We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Epigenesis, Genetic*
  • Epigenomics / methods*
  • Genomics / methods
  • Histones / metabolism
  • Humans
  • Internet
  • Lysine / metabolism
  • Methylation
  • Oryzias / genetics
  • Oryzias / metabolism
  • Reproducibility of Results
  • Software

Substances

  • Histones
  • Lysine