Computational learning on specificity-determining residue-nucleotide interactions

Nucleic Acids Res. 2015 Dec 2;43(21):10180-9. doi: 10.1093/nar/gkv1134. Epub 2015 Nov 2.

Abstract

The protein-DNA interactions between transcription factors and transcription factor binding sites are essential activities in gene regulation. To decipher the binding codes, it is a long-standing challenge to understand the binding mechanism across different transcription factor DNA binding families. Past computational learning studies usually focus on learning and predicting the DNA binding residues on protein side. Taking into account both sides (protein and DNA), we propose and describe a computational study for learning the specificity-determining residue-nucleotide interactions of different known DNA-binding domain families. The proposed learning models are compared to state-of-the-art models comprehensively, demonstrating its competitive learning performance. In addition, we describe and propose two applications which demonstrate how the learnt models can provide meaningful insights into protein-DNA interactions across different DNA binding families.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Binding Sites
  • Computational Biology / methods
  • DNA / chemistry*
  • DNA / metabolism
  • DNA-Binding Proteins / chemistry*
  • DNA-Binding Proteins / metabolism
  • Humans
  • Machine Learning
  • Models, Molecular
  • Nucleotide Motifs
  • Position-Specific Scoring Matrices
  • Protein Binding
  • Protein Structure, Tertiary
  • Sequence Analysis, DNA
  • Sequence Analysis, Protein / methods*

Substances

  • DNA-Binding Proteins
  • DNA