Using random forest to classify T-cell epitopes based on amino acid properties and molecular features

Anal Chim Acta. 2013 Dec 4:804:70-5. doi: 10.1016/j.aca.2013.10.003. Epub 2013 Oct 12.

Abstract

T-lymphocyte (T-cell) is a very important component in human immune system. T-cell epitopes can be used for the accurately monitoring the immune responses which activation by major histocompatibility complex (MHC), and rationally designing vaccines. Therefore, accurate prediction of T-cell epitopes is crucial for vaccine development and clinical immunology. In current study, two types peptide features, i.e., amino acid properties and chemical molecular features were used for the T-cell epitopes peptide representation. Based on these features, random forest (RF) algorithm, a powerful machine learning algorithm, was used to classify T-cell epitopes and non-T-cell epitopes. The classification accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC) values for proposed method are 97.54%, 97.22%, 97.60%, 0.9193, and 0.9868, respectively. These results indicate that current method based on the combined features and RF is effective for T-cell epitopes prediction.

Keywords: Amino acid properties; Chemical molecular features; MHC; RF; Random forest (RF); T cell receptors; T-cell epitopes; TCRs; major histocompatibility complex; random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Epitopes / chemistry*
  • Internet
  • ROC Curve
  • T-Lymphocytes / chemistry*

Substances

  • Amino Acids
  • Epitopes