Discovering Gene Regulatory Elements Using Coverage-Based Heuristics

Rami Al-Ouran; Robert Schmidt; Ashwini Naik; Jeffrey Jones; Frank Drews; David Juedes; Laura Elnitski; Lonnie Welch

doi:10.1109/TCBB.2015.2496261

Discovering Gene Regulatory Elements Using Coverage-Based Heuristics

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1290-1300. doi: 10.1109/TCBB.2015.2496261. Epub 2015 Oct 30.

Authors

Rami Al-Ouran, Robert Schmidt, Ashwini Naik, Jeffrey Jones, Frank Drews, David Juedes, Laura Elnitski, Lonnie Welch

PMID: 26540692
DOI: 10.1109/TCBB.2015.2496261

Abstract

Data mining algorithms and sequencing methods (such as RNA-seq and ChIP-seq) are being combined to discover genomic regulatory motifs that relate to a variety of phenotypes. However, motif discovery algorithms often produce very long lists of putative transcription factor binding sites, hindering the discovery of phenotype-related regulatory elements by making it difficult to select a manageable set of candidate motifs for experimental validation. To address this issue, the authors introduce the motif selection problem and provide coverage-based search heuristics for its solution. Analysis of 203 ChIP-seq experiments from the ENCyclopedia of DNA Elements project shows that our algorithms produce motifs that have high sensitivity and specificity and reveals new insights about the regulatory code of the human genome. The greedy algorithm performs the best, selecting a median of two motifs per ChIP-seq transcription factor group while achieving a median sensitivity of 77 percent.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Chromatin Immunoprecipitation
Computational Biology / methods*
Computer Heuristics
Disease / genetics
Humans
Nucleotide Motifs / genetics
Regulatory Sequences, Nucleic Acid / genetics*
Sequence Analysis, DNA