Bayesian hierarchical model of protein-binding microarray k-mer data reduces noise and identifies transcription factor subclasses and preferred k-mers

Bioinformatics. 2013 Jun 1;29(11):1390-8. doi: 10.1093/bioinformatics/btt152. Epub 2013 Apr 4.

Abstract

Motivation: Sequence-specific transcription factors (TFs) regulate the expression of their target genes through interactions with specific DNA-binding sites in the genome. Data on TF-DNA binding specificities are essential for understanding how regulatory specificity is achieved.

Results: Numerous studies have used universal protein-binding microarray (PBM) technology to determine the in vitro binding specificities of hundreds of TFs for all possible 8 bp sequences (8mers). We have developed a Bayesian analysis of variance (ANOVA) model that decomposes these 8mer data into background noise, TF familywise effects and effects due to the particular TF. Adjusting for background noise improves PBM data quality and concordance with in vivo TF binding data. Moreover, our model provides simultaneous identification of TF subclasses and their shared sequence preferences, and also of 8mers bound preferentially by individual members of TF subclasses. Such results may aid in deciphering cis-regulatory codes and determinants of protein-DNA binding specificity.

Availability and implementation: Source code, compiled code and R and Python scripts are available from http://thebrain.bwh.harvard.edu/hierarchicalANOVA.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Analysis of Variance
  • Artifacts
  • Bayes Theorem
  • Binding Sites
  • Chromatin Immunoprecipitation
  • DNA-Binding Proteins / classification
  • DNA-Binding Proteins / metabolism
  • Oligonucleotide Array Sequence Analysis / methods*
  • Regulatory Elements, Transcriptional*
  • Software
  • Transcription Factors / classification*
  • Transcription Factors / metabolism*

Substances

  • DNA-Binding Proteins
  • Transcription Factors