Signal-processing-based bioinformatics approach for the identification of influenza A virus subtypes in neuraminidase genes

Annu Int Conf IEEE Eng Med Biol Soc. 2013:2013:3066-9. doi: 10.1109/EMBC.2013.6610188.

Abstract

Neuraminidase (NA) genes of influenza A virus is a highly potential candidate for antiviral drug development that can only be realized through true identification of its sub-types. In this paper, in order to accurately detect the sub-types, a hybrid predictive model is therefore developed and tested over proteins obtained from the four subtypes of the influenza A virus, namely, H1N1, H2N2, H3N2 and H5N1 that caused major pandemics in the twentieth century. The predictive model is built by the following four main steps; (i) decoding the protein sequences into numerical signals by means of EIIP amino acid scale, (ii) analysing these signals (protein sequences) by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) selecting more influential sub-set of the features by using the F-score statistical feature selection method, and finally (iv) building a predictive model on the feature sub-set by using support vector machine classifier. The protein sequences were chosen as to be of high percentage identity that they demonstrate within individual influenza subtype classes and high variation that they display in the percentage identity. This makes the proteins very difficult to distinguish from each other even they belong to different subtypes. Given this set of the proteins, the predictive model yielded 98.3% accuracy based on a 5-fold cross validation. This also results in a twenty feature sub-set that can also help reveal spectral characteristics of the subtypes. The proposed model is promising and can easily be generalized for other similar studies.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Humans
  • Influenza A Virus, H1N1 Subtype / genetics
  • Influenza A Virus, H2N2 Subtype / genetics
  • Influenza A Virus, H3N2 Subtype / genetics
  • Influenza A Virus, H5N1 Subtype / genetics
  • Influenza A virus / genetics*
  • Neuraminidase / chemistry
  • Neuraminidase / genetics*
  • Sequence Analysis, Protein
  • Sequence Homology, Amino Acid
  • Signal Processing, Computer-Assisted*
  • Support Vector Machine

Substances

  • Neuraminidase