Motif analysis in DNAse hypersensitivity regions uncovers distal cis elements associated with gene expression

Bioinformation. 2013;9(4):212-5. doi: 10.6026/97320630009212. Epub 2013 Feb 21.

Abstract

Reliable identification of cis regulatory elements influencing transcription remains a challenging problem in molecular bioinformatics. This is especially true for enhancer elements which are often located hundreds of kilobases from the gene promoter. High resolution DNase hypersensitivity and connectivity profiling by the ENCODE consortium provides evidence of millions of interacting cis-acting elements in the human genome. This prior knowledge can be incorporated into genome-wide expression analyses, in the form of gene sets sharing regulatory sequence motifs in known DNase hypersensitivity peak regions. High proportions of enrichment among the most extreme differentially transcribed genes from controlled biological experiments may suggest novel hypotheses about signalling pathways. The utility of this approach is demonstrated with the reanalysis of a microarray-derived gene expression data set through the Gene Set Enrichment Analysis pipeline, uncovering new putative distal cis elements in the context of innate immunity. The DNase Hypersensitivity Connectivity informed Motif Enrichment in Gene Expression (DHC-MEGE) method described here has the advantage of identifying distal elements such as enhancers, which are often overlooked with standard promoter motif analysis.

Availability: The DHC-MEGE shell script can be obtained from Sourceforge https://sourceforge.net/projects/dhcmege/ and the generated GMT file is attached as supplementary data.

Keywords: DNAse hypersensitivity; enhancer; gene expression; gene set enrichment analysis; motif enrichment.