HIV-1 protease cleavage site prediction based on two-stage feature selection method

Protein Pept Lett. 2013 Mar;20(3):290-8. doi: 10.2174/0929866511320030007.

Abstract

Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Binding Sites
  • HIV / enzymology*
  • HIV Infections / enzymology*
  • HIV Protease / chemistry*
  • HIV Protease / genetics
  • Humans
  • Models, Chemical
  • Structure-Activity Relationship

Substances

  • HIV Protease
  • p16 protease, Human immunodeficiency virus 1