Association rule mining to identify transcription factor interactions in genomic regions

Bioinformatics. 2020 Feb 15;36(4):1007-1013. doi: 10.1093/bioinformatics/btz687.

Abstract

Motivation: Genome regulatory networks have different layers and ways to modulate cellular processes, such as cell differentiation, proliferation, and adaptation to external stimuli. Transcription factors and other chromatin-associated proteins act as combinatorial protein complexes that control gene transcription. Thus, identifying functional interaction networks among these proteins is a fundamental task to understand the genome regulation framework.

Results: We developed a novel approach to infer interactions among transcription factors in user-selected genomic regions, by combining the computation of association rules and of a novel Importance Index on ChIP-seq datasets. The hallmark of our method is the definition of the Importance Index, which provides a relevance measure of the interaction among transcription factors found associated in the computed rules. Examples on synthetic data explain the index use and potential. A straightforward pre-processing pipeline enables the easy extraction of input data for our approach from any set of ChIP-seq experiments. Applications on ENCODE ChIP-seq data prove that our approach can reliably detect interactions between transcription factors, including known interactions that validate our approach.

Availability and implementation: A R/Bioconductor package implementing our association rules and Importance Index-based method is available at http://bioconductor.org/packages/release/bioc/html/TFARM.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining
  • Genome*
  • Genomics
  • Software
  • Transcription Factors

Substances

  • Transcription Factors