Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features

Biochimie. 2014 Aug:103:1-6. doi: 10.1016/j.biochi.2014.03.016. Epub 2014 Apr 8.

Abstract

Identification and characterization of B-cell epitopes in target antigens was one of the key steps in epitopes-driven vaccine design, immunodiagnostic tests, and antibody production. Experimental determination of epitopes was labor-intensive and expensive. Therefore, there was an urgent need of computational methods for reliable identification of B-cell epitopes. In current study, we proposed a novel peptide feature description method which combined peptide amino acid properties with chemical molecular features. Based on these combined features, a random forest (RF) classifier was adopted to classify B-cell epitopes and non-epitopes. RF is an ensemble method that uses recursive partitioning to generate many trees for aggregating the results; and it always produces highly competitive models. The classification accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC) values for current method were 78.31%, 80.05%, 72.23%, 0.5836, and 0.8800, respectively. These results showed that an appropriate combination of peptide amino acid features and chemical molecular features with a RF model could enhance the prediction performance of linear B-cell epitopes. Finally, a freely online service was available at http://sysbio.yznu.cn/Research/Epitopesprediction.aspx.

Keywords: Amino acid properties; Chemical molecular features; Computational method; Epitopes identification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • Artificial Intelligence*
  • Chemical Phenomena
  • Computational Biology / methods*
  • Databases, Protein
  • Epitopes, B-Lymphocyte / chemistry*
  • Internet
  • ROC Curve
  • Sequence Homology

Substances

  • Amino Acids
  • Epitopes, B-Lymphocyte