DIpartite: A tool for detecting bipartite motifs by considering base interdependencies

PLoS One. 2019 Aug 30;14(8):e0220207. doi: 10.1371/journal.pone.0220207. eCollection 2019.

Abstract

It is extremely important to identify transcription factor binding sites (TFBSs). Some TFBSs are proposed to be bipartite motifs known as two-block motifs separated by gap sequences with variable lengths. While position weight matrix (PWM) is commonly used for the representation and prediction of TFBSs, dinucleotide weight matrix (DWM) enables expression of the interdependencies of neighboring bases. By incorporating DWM into the detection of bipartite motifs, we have developed a novel tool for ab initio motif detection, DIpartite (bipartite motif detection tool based on dinucleotide weight matrix) using a Gibbs sampling strategy and the minimization of Shannon's entropy. DIpartite predicts the bipartite motifs by considering the interdependencies of neighboring positions, that is, DWM. We compared DIpartite with other available alternatives by using test datasets, namely, of CRP in E. coli, sigma factors in B. subtilis, and promoter sequences in humans. We have developed DIpartite for the detection of TFBSs, particularly bipartite motifs. DIpartite enables ab initio prediction of conserved motifs based on not only PWM, but also DWM. We evaluated the performance of DIpartite by comparing it with freely available tools, such as MEME, BioProspector, BiPad, and AMD. Taken the obtained findings together, DIpartite performs equivalently to or better than these other tools, especially for detecting bipartite motifs with variable gaps. DIpartite requires users to specify the motif lengths, gap length, and PWM or DWM. DIpartite is available for use at https://github.com/Mohammad-Vahed/DIpartite.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Pairing
  • Clostridium / genetics
  • Computational Biology / methods*
  • Cyclic AMP Receptor Protein / genetics
  • Escherichia coli / genetics
  • Escherichia coli Proteins / genetics
  • Humans
  • Nucleotide Motifs*
  • Position-Specific Scoring Matrices
  • Promoter Regions, Genetic / genetics
  • Sigma Factor / genetics

Substances

  • Cyclic AMP Receptor Protein
  • Escherichia coli Proteins
  • Sigma Factor
  • crp protein, E coli

Supplementary concepts

  • Clostridium subterminale

Grants and funding

This work was partly supported by MEXT KAKENHI (16K18671) to HT, AMED under Grant Number JP19fm0208024 to HT, the Tenure Tracking System Program of MEXT to HT, the Institute for Global Prominent Research, Chiba University, to HT, and MEXT KAKENHI (16H06279) to HT and JI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.