GenoGAM: genome-wide generalized additive models for ChIP-Seq analysis

Georg Stricker; Alexander Engelhardt; Daniel Schulz; Matthias Schmid; Achim Tresch; Julien Gagneur

doi:10.1093/bioinformatics/btx150

GenoGAM: genome-wide generalized additive models for ChIP-Seq analysis

Bioinformatics. 2017 Aug 1;33(15):2258-2265. doi: 10.1093/bioinformatics/btx150.

Authors

Georg Stricker^{1

2}, Alexander Engelhardt¹, Daniel Schulz¹, Matthias Schmid³, Achim Tresch⁴, Julien Gagneur^{1

2}

Affiliations

¹ Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany.
² Department of Informatics, Technische Universität München, 85748 Garching, Germany.
³ Institut für Medizinische Biometrie, Informatik und Epidemiologie, University Hospital Bonn, 53105 Bonn, Germany.
⁴ Institute for Genetics, University of Cologne, 50647 Cologne, Germany.

PMID: 28369277
DOI: 10.1093/bioinformatics/btx150

Abstract

Motivation: Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective.

Results: Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays.

Availability and implementation: Software is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.html .

Contact: gagneur@in.tum.de.

Supplementary information: Supplementary information is available at Bioinformatics online.

MeSH terms

Animals
Chromatin Immunoprecipitation / methods*
DNA Methylation*
Genomics / methods
High-Throughput Nucleotide Sequencing / methods*
Humans
Mice
Models, Biological
Models, Statistical*
Sequence Analysis, DNA / methods
Software*
Yeasts / genetics