MCAT: Motif Combining and Association Tool

J Comput Biol. 2019 Jan;26(1):1-15. doi: 10.1089/cmb.2018.0113. Epub 2018 Nov 10.

Abstract

De novo motif discovery in biological sequences is an important and computationally challenging problem. A myriad of algorithms have been developed to solve this problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. We present a novel and fast tool ensemble MCAT (Motif Combining and Association Tool) for de novo motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). We apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif-finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has a significantly better performance in practice.

Keywords: ensemble algorithm; motif finding; protein-binding site.

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods*
  • Humans
  • Sequence Analysis, DNA / methods
  • Software