Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition

J Biomol Struct Dyn. 2016 Sep;34(9):1946-61. doi: 10.1080/07391102.2015.1095116. Epub 2015 Oct 29.

Abstract

With the explosive growth of protein sequences entering into protein data banks in the post-genomic era, it is highly demanded to develop automated methods for rapidly and effectively identifying the protein-protein binding sites (PPBSs) based on the sequence information alone. To address this problem, we proposed a predictor called iPPBS-PseAAC, in which each amino acid residue site of the proteins concerned was treated as a 15-tuple peptide segment generated by sliding a window along the protein chains with its center aligned with the target residue. The working peptide segment is further formulated by a general form of pseudo amino acid composition via the following procedures: (1) it is converted into a numerical series via the physicochemical properties of amino acids; (2) the numerical series is subsequently converted into a 20-D feature vector by means of the stationary wavelet transform technique. Formed by many individual "Random Forest" classifiers, the operation engine to run prediction is a two-layer ensemble classifier, with the 1st-layer voting out the best training data-set from many bootstrap systems and the 2nd-layer voting out the most relevant one from seven physicochemical properties. Cross-validation tests indicate that the new predictor is very promising, meaning that many important key features, which are deeply hidden in complicated protein sequences, can be extracted via the wavelets transform approach, quite consistent with the facts that many important biological functions of proteins can be elucidated with their low-frequency internal motions. The web server of iPPBS-PseAAC is accessible at http://www.jci-bioinfo.cn/iPPBS-PseAAC , by which users can easily acquire their desired results without the need to follow the complicated mathematical equations involved.

Keywords: asymmetric bootstrap; physicochemical property; protein–protein binding sites; pseudo amino acid composition; random forest; stationary wavelet transform.

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • Binding Sites*
  • Carrier Proteins / chemistry*
  • Carrier Proteins / metabolism
  • Models, Theoretical
  • Protein Binding
  • Protein Interaction Domains and Motifs
  • Proteins / chemistry*
  • Proteins / metabolism

Substances

  • Amino Acids
  • Carrier Proteins
  • Proteins