MISAE: a new approach for regulatory motif extraction

Proc IEEE Comput Syst Bioinform Conf. 2004:173-81. doi: 10.1109/csb.2004.1332430.

Abstract

The recognition of regulatory motifs of co-regulated genes is essential for understanding the regulatory mechanisms. However, the automatic extraction of regulatory motifs from a given data set of the upstream non-coding DNA sequences of a family of co-regulated genes is difficult because regulatory motifs are often subtle and inexact. This problem is further complicated by the corruption of the data sets. In this paper, a new approach called Mismatch-allowed Probabilistic Suffix Tree Motif Extraction (MISAE) is proposed. It combines the mismatch-allowed probabilistic suffix tree that is a probabilistic model and local prediction for the extraction of regulatory motifs. The proposed approach is tested on 15 co-regulated gene families and compares favorably with other state-of-the-art approaches. Moreover, MISAE performs well on "corrupted" data sets. It is able to extract the motif from a "corrupted" data set with less than one fourth of the sequences containing the real motif.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Cluster Analysis
  • Computer Simulation
  • DNA / genetics*
  • Entropy
  • Gene Expression / genetics*
  • Gene Expression Profiling / methods*
  • Models, Genetic*
  • Models, Statistical
  • Pattern Recognition, Automated / methods
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA