Using combined evidence from replicates to evaluate ChIP-seq peaks

Vahid Jalili; Matteo Matteucci; Marco Masseroli; Marco J Morelli

doi:10.1093/bioinformatics/btv293

Using combined evidence from replicates to evaluate ChIP-seq peaks

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

Authors

Vahid Jalili¹, Matteo Matteucci¹, Marco Masseroli¹, Marco J Morelli²

Affiliations

¹ Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy and.
² Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), 20139 Milan, Italy.

PMID: 25957351
DOI: 10.1093/bioinformatics/btv293

Abstract

Motivation: Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) detects genome-wide DNA-protein interactions and chromatin modifications, returning enriched regions (ERs), usually associated with a significance score. Moderately significant interactions can correspond to true, weak interactions, or to false positives; replicates of a ChIP-seq experiment can provide co-localised evidence to decide between the two cases. We designed a general methodological framework to rigorously combine the evidence of ERs in ChIP-seq replicates, with the option to set a significance threshold on the repeated evidence and a minimum number of samples bearing this evidence.

Results: We applied our method to Myc transcription factor ChIP-seq datasets in K562 cells available in the ENCODE project. Using replicates, we could extend up to 3 times the ER number with respect to single-sample analysis with equivalent significance threshold. We validated the 'rescued' ERs by checking for the overlap with open chromatin regions and for the enrichment of the motif that Myc binds with strongest affinity; we compared our results with alternative methods (IDR and jMOSAiCS), obtaining more validated peaks than the former and less peaks than latter, but with a better validation.

Availability and implementation: An implementation of the proposed method and its source code under GPLv3 license are freely available at http://www.bioinformatics.deib.polimi.it/MSPC/ and http://mspc.codeplex.com/, respectively.

Contact: marco.morelli@iit.it

Supplementary information: Supplementary Material are available at Bioinformatics online.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Chromatin / genetics
Chromatin / metabolism*
Chromatin Immunoprecipitation / methods*
Computational Biology / methods
Data Interpretation, Statistical
Gene Expression Regulation
Genome, Human*
High-Throughput Nucleotide Sequencing*
Humans
K562 Cells
Nucleotide Motifs / genetics
Protein Binding
Protein Structure, Tertiary
Proto-Oncogene Proteins c-myc / genetics
Proto-Oncogene Proteins c-myc / metabolism
Quality Control
Reproducibility of Results
Sequence Analysis, DNA
Software
Transcription Factors / metabolism*
Ubiquitin-Protein Ligases / genetics
Ubiquitin-Protein Ligases / metabolism*

Substances

Chromatin
MYC protein, human
Proto-Oncogene Proteins c-myc
Transcription Factors
STUB1 protein, human
Ubiquitin-Protein Ligases