An Effective Antifreeze Protein Predictor with Ensemble Classifiers and Comprehensive Sequence Descriptors

Int J Mol Sci. 2015 Sep 7;16(9):21191-214. doi: 10.3390/ijms160921191.

Abstract

Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method.

Keywords: antifreeze proteins; ensemble method; majority voting; random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Antifreeze Proteins / chemistry*
  • Antifreeze Proteins / classification*
  • Computational Biology / methods*
  • Internet
  • Reproducibility of Results
  • Support Vector Machine*
  • Web Browser

Substances

  • Antifreeze Proteins