CMT: a constrained multi-level thresholding approach for ChIP-Seq data analysis

PLoS One. 2014 Apr 15;9(4):e93873. doi: 10.1371/journal.pone.0093873. eCollection 2014.

Abstract

Genome-wide profiling of DNA-binding proteins using ChIP-Seq has emerged as an alternative to ChIP-chip methods. ChIP-Seq technology offers many advantages over ChIP-chip arrays, including but not limited to less noise, higher resolution, and more coverage. Several algorithms have been developed to take advantage of these abilities and find enriched regions by analyzing ChIP-Seq data. However, the complexity of analyzing various patterns of ChIP-Seq signals still needs the development of new algorithms. Most current algorithms use various heuristics to detect regions accurately. However, despite how many formulations are available, it is still difficult to accurately determine individual peaks corresponding to each binding event. We developed Constrained Multi-level Thresholding (CMT), an algorithm used to detect enriched regions on ChIP-Seq data. CMT employs a constraint-based module that can target regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks) by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies) for Drosophila melanogaster and the H3K4ac antibody dataset.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Chromatin Immunoprecipitation*
  • Chromosomes, Insect
  • Computational Biology / methods*
  • Computational Biology / standards*
  • DNA-Binding Proteins / genetics
  • DNA-Binding Proteins / metabolism
  • Datasets as Topic
  • Drosophila melanogaster / genetics
  • Drosophila melanogaster / metabolism
  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • ROC Curve
  • Sequence Analysis, DNA*

Substances

  • DNA-Binding Proteins

Grants and funding

This research has been funded by the Natural Sciences and Engineering Research Council of Canada, NSERC (www.nserc.ca). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.