DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences

Brief Bioinform. 2021 Nov 5;22(6):bbab336. doi: 10.1093/bib/bbab336.

Abstract

Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.

Keywords: A-DNA; B-DNA; DNA-binding residues; double-stranded DNA; prediction; protein–DNA interactions; single-stranded DNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Base Sequence*
  • Binding Sites*
  • Computational Biology / methods*
  • DNA / chemistry*
  • DNA / genetics
  • DNA-Binding Proteins / chemistry
  • DNA-Binding Proteins / metabolism*
  • Databases, Genetic
  • Machine Learning
  • Models, Molecular
  • Protein Binding
  • Reproducibility of Results
  • Software*
  • Structure-Activity Relationship
  • Web Browser

Substances

  • DNA-Binding Proteins
  • DNA