CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation

Anna A Nikulova; Alexander V Favorov; Roman A Sutormin; Vsevolod J Makeev; Andrey A Mironov

doi:10.1093/nar/gks235

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation

Nucleic Acids Res. 2012 Jul;40(12):e93. doi: 10.1093/nar/gks235. Epub 2012 Mar 15.

Authors

Anna A Nikulova¹, Alexander V Favorov, Roman A Sutormin, Vsevolod J Makeev, Andrey A Mironov

Affiliation

¹ Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73 Leninskie Gory, Moscow 119991, Russia. nikanka@bioinf.fbb.msu.ru

Abstract

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Body Patterning / genetics
Drosophila / embryology
Drosophila / genetics
Drosophila / metabolism
Enhancer Elements, Genetic
Gene Expression Regulation*
Gene Expression Regulation, Developmental
Muscles / metabolism
Position-Specific Scoring Matrices
Regulatory Elements, Transcriptional*
Sequence Analysis, DNA*
Software

Grants and funding

R01 GM076655/GM/NIGMS NIH HHS/United States