A threading-based method for the prediction of DNA-binding proteins with application to the human genome

PLoS Comput Biol. 2009 Nov;5(11):e1000567. doi: 10.1371/journal.pcbi.1000567. Epub 2009 Nov 13.

Abstract

Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that approximately 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that approximately 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Computational Biology / methods*
  • DNA-Binding Proteins / chemistry
  • DNA-Binding Proteins / genetics*
  • DNA-Binding Proteins / physiology
  • Data Interpretation, Statistical
  • Drosophila melanogaster / genetics
  • Genes
  • Genome, Human*
  • Humans
  • Models, Molecular
  • Protein Binding / genetics
  • Sequence Analysis, Protein / methods*

Substances

  • DNA-Binding Proteins