Sequence based human leukocyte antigen gene prediction using informative physicochemical properties

Int J Data Min Bioinform. 2015;13(3):211-24. doi: 10.1504/ijdmb.2015.072072.

Abstract

Prediction of different classes within the human leukocyte antigen (HLA) gene family can provide insight into the human immune system and its response to viral pathogens. Therefore, it is desirable to develop an efficient and easily interpretable method for predicting HLA gene class compared to existing methods. We investigated the HLA gene prediction problem as follows: (a) establishing a dataset (HLA262) such that the sequence identity of the complete HLA dataset was reduced to 30%; (b) proposing a feature set of informative physicochemical properties that cooperate with SVM (named HLAPred) to achieve high accuracy and sensitivity (90.04% and 82.99%, respectively) compared with existing methods; and (c) analysing the informative physicochemical properties to understand the physicochemical properties and molecular mechanisms of the HLA gene family.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Data Mining / methods
  • Databases, Protein*
  • HLA-A Antigens / chemistry*
  • HLA-A Antigens / immunology*
  • Humans
  • Leukocytes / chemistry
  • Leukocytes / immunology*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods
  • Sequence Analysis, Protein / methods*
  • Structure-Activity Relationship
  • Support Vector Machine*

Substances

  • HLA-A Antigens