Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection

Mol Biosyst. 2013 Jan 27;9(1):61-9. doi: 10.1039/c2mb25327e. Epub 2012 Nov 2.

Abstract

Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Catalytic Domain
  • Chemical Phenomena
  • Computational Biology / methods*
  • Conserved Sequence
  • Databases, Protein
  • Decision Trees
  • Enzymes / chemistry*
  • Enzymes / metabolism*
  • Models, Chemical*
  • Protein Structure, Secondary
  • Sequence Analysis, Protein
  • Structure-Activity Relationship
  • Support Vector Machine

Substances

  • Enzymes