Protein interaction hotspot identification using sequence-based frequency-derived features

IEEE Trans Biomed Eng. 2013 Nov;60(11):2993-3002. doi: 10.1109/TBME.2011.2161306. Epub 2011 Jul 7.

Abstract

Finding good descriptors, capable of discriminating hotspot residues from others, is still a challenge in many attempts to understand protein interaction. In this paper, descriptors issued from the analysis of amino acid sequences using digital signal processing (DSP) techniques are shown to be as good as those derived from protein tertiary structure and/or information on the complex. The simulation results show that our descriptors can be used separately to predict hotspots, via a random forest classifier, with an accuracy of 79% and a precision of 75%. They can also be used jointly with features derived from tertiary structures to boost the performance up to an accuracy of 82% and a precision of 80%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Computational Biology / methods*
  • Computer Simulation
  • Models, Molecular
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sequence Analysis, Protein / methods*

Substances

  • Amino Acids
  • Proteins