A subspace method for the detection of transcription factor binding sites

Bioinformatics. 2012 May 15;28(10):1328-35. doi: 10.1093/bioinformatics/bts147. Epub 2012 Mar 29.

Abstract

Motivation: The identification of the sites at which transcription factors (TFs) bind to Deoxyribonucleic acid (DNA) is an important problem in molecular biology. Many computational methods have been developed for motif finding, most of them based on position-specific scoring matrices (PSSMs) which assume the independence of positions within a binding site. However, some experimental and computational studies demonstrate that interdependences within the positions exist.

Results: In this article, we introduce a novel motif finding method which constructs a subspace based on the covariance of numerical DNA sequences. When a candidate sequence is projected into the modeled subspace, a threshold in the Q-residuals confidence allows us to predict whether this sequence is a binding site. Using the TRANSFAC and JASPAR databases, we compared our Q-residuals detector with existing PSSM methods. In most of the studied TF binding sites, the Q-residuals detector performs significantly better and faster than MATCH and MAST. As compared with Motifscan, a method which takes into account interdependences, the performance of the Q-residuals detector is better when the number of available sequences is small.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Binding Sites
  • Humans
  • Nucleotide Motifs*
  • Position-Specific Scoring Matrices*
  • Protein Binding
  • Sequence Analysis, DNA / methods
  • Transcription Factors / chemistry
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors