FRKAS: knowledge acquisition using a fuzzy rule base approach to insight of DNA-binding domains/proteins

Protein Pept Lett. 2013 Mar;20(3):299-308. doi: 10.2174/0929866511320030008.

Abstract

Numerous prediction methods of DNA-binding domains/proteins were proposed by identifying informative features and designing effective classifiers. These researches reveal that the DNA-protein binding mechanism is complicated and existing accurate predictors such as support vector machine (SVM) with position specific scoring matrices (PSSMs) are regarded as black-box methods which are not easily interpretable for biologists. In this study, we propose an ensemble fuzzy rule base classifier consisting of a set of interpretable fuzzy rule classifiers (iFRCs) using informative physicochemical properties as features. In designing iFRCs, feature selection, membership function design, and fuzzy rule base generation are all simultaneously optimized using an intelligent genetic algorithm (IGA). IGA maximizes prediction accuracy, minimizes the number of features selected, and minimizes the number of fuzzy rules to generate an accurate and concise fuzzy rule base. Benchmark datasets of DNA-binding domains are used to evaluate the proposed ensemble classifier of 30 iFRCs. Each iFRC has a mean test accuracy of 77.46%, and the ensemble classifier has a test accuracy of 83.33%, where the method of SVM with PSSMs has the accuracy of 82.81%. The physicochemical properties of the first two ranks according to their contribution are positive charge and Van Der Waals volume. Charge complementarity between protein and DNA is thought to be important in the first step of recognition between protein and DNA. The amino acid residues of binding peptides have larger Van Der Waals volumes and positive charges than those of non-binding ones. The proposed knowledge acquisition method by establishing a fuzzy rule-based classifier can also be applicable to predict and analyze other protein functions from sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • DNA / chemistry*
  • DNA-Binding Proteins / chemistry*
  • Databases, Protein
  • Fuzzy Logic*
  • Position-Specific Scoring Matrices
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Support Vector Machine

Substances

  • Amino Acids
  • DNA-Binding Proteins
  • Proteins
  • DNA