Prediction of protein allergenicity based on signal-processing bioinformatics approach

Annu Int Conf IEEE Eng Med Biol Soc. 2014:2014:808-11. doi: 10.1109/EMBC.2014.6943714.

Abstract

Current bioinformatics tools accomplish high accuracies in classifying allergenic protein sequences with high homology and generally perform poorly with low homology protein sequences. Although some homologous regions explained Immunoglobulin E (IgE) cross-reactivity in groups of allergens, no universal molecular structure could be associated with allergenicity. In addition, studies have showed that cross-reactivity is not directly linked to the homology between protein sequences. Therefore, a new homology independent method needs to be developed to determine if a protein is an allergen or not. The aim of this study is therefore to differentiate sets of allergenic and non-allergenic proteins using a signal-processing based bioinformatics approach. In this paper, a new method was proposed for characterisation and classification of allergenic protein sequences. For this method hydrophobicity amino acid index was used to encode proteins to numerical sequences and Discrete Fourier Transform to extract features for each protein. Finally, a classifier was constructed based on Support Vector Machines. In order to demonstrate the applicability of the proposed method 857 allergen and 1000 non-allergen proteins were collected from UniProt online database. The results obtained from the proposed method yielded: MCC: 0.752 ± 0.007, Specificity: 0.912 ± 0.005, Sensitivity: 0.835 ± 0.008 and Total Accuracy: 87.65% ± 0.004.

MeSH terms

  • Allergens / chemistry*
  • Allergens / classification*
  • Amino Acid Sequence
  • Computational Biology / methods*
  • Databases, Protein
  • Sequence Analysis, Protein
  • Signal Processing, Computer-Assisted*

Substances

  • Allergens