Accurate classification and hemagglutinin amino acid signatures for influenza A virus host-origin association and subtyping

Virology. 2014 Jan 20:449:328-38. doi: 10.1016/j.virol.2013.11.010. Epub 2013 Dec 22.

Abstract

Host-origin classification and signatures of influenza A viruses were investigated based on the HA protein for tracking of the HA host of origin. Hidden Markov models (HMMs), decision trees and associative classification for each influenza A virus subtype and its major hosts (human, avian, swine) were generated. Features of the HA protein signatures that were host-and subtype-specific were sought. Host-associated signatures that occurred in different subtypes of the virus were identified. Evaluation of the classification models based on ROC curves and support and confidence ratings for the amino acid class-association rules was performed. Host classification based on the HA subtype achieved accuracies between 91.2% and 100% using decision trees after feature selection. Host-specific class association rules for avian-host origins gave better support and confidence ratings, followed by human and finally swine origin. This finding indicated the lower specificity of the swine host, perhaps pointing to its ability to mix different strains.

Keywords: HA protein; Hemagglutinin subtyping; Influenza; Virus signature.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Animals
  • Birds
  • Hemagglutinin Glycoproteins, Influenza Virus / chemistry
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics*
  • Humans
  • Influenza A virus / classification
  • Influenza A virus / genetics
  • Influenza A virus / isolation & purification*
  • Influenza in Birds / virology*
  • Influenza, Human / virology*
  • Molecular Sequence Data
  • Mutation, Missense
  • Orthomyxoviridae Infections / veterinary*
  • Orthomyxoviridae Infections / virology
  • Species Specificity
  • Swine
  • Swine Diseases / virology*

Substances

  • Hemagglutinin Glycoproteins, Influenza Virus