Data mining of sequences and 3D structures of allergenic proteins

Bioinformatics. 2002 Oct;18(10):1358-64. doi: 10.1093/bioinformatics/18.10.1358.

Abstract

Motivation: Many sequences, and in some cases structures, of proteins that induce an allergic response in atopic individuals have been determined in recent years. This data indicates that allergens, regardless of source, fall into discreet protein families. Similarities in the sequence may explain clinically observed cross-reactivities between different biological triggers. However, previously available allergy databases group allergens according to their biological sources, or observed clinical cross-reactivities, without providing data about the proteins. A computer-aided data mining system is needed to compare the sequential and structural details of known allergens. This information will aid in predicting allergenic cross-responses and eventually in determining possible common characteristics of IgE recognition.

Results: The new web-based Structural Database of Allergenic Proteins (SDAP) permits the user to quickly compare the sequence and structure of allergenic proteins. Data from literature sources and previously existing lists of allergens are combined in a MySQL interactive database with a wide selection of bioinformatics applications. SDAP can be used to rapidly determine the relationship between allergens and to screen novel proteins for the presence of IgE or T-cell epitopes they may share with known allergens. Further, our novel similarity search method, based on five dimensional descriptors of amino acid properties, can be used to scan the SDAP entries with a peptide sequence. For example, when a known IgE binding epitope from shrimp tropomyosin was used as a query, the method rapidly identified a similar sequence in known shellfish and insect allergens. This prediction of cross-reactivity between allergens is consistent with clinical observations.

Availability: SDAP is available on the web at http://fermi.utmb.edu/SDAP/index.html

MeSH terms

  • Allergens / chemistry*
  • Allergens / classification
  • Allergens / genetics
  • Allergens / immunology
  • Amino Acid Sequence
  • Cross Reactions
  • Database Management Systems
  • Databases, Protein*
  • Immunoglobulin E / chemistry
  • Immunoglobulin E / classification
  • Immunoglobulin E / genetics
  • Information Storage and Retrieval / methods*
  • Internet
  • Molecular Sequence Data
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Proteins / immunology
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, Protein / methods*
  • Software
  • Species Specificity

Substances

  • Allergens
  • Proteins
  • Immunoglobulin E