Effects of windowing and zero-padding on Complex Resonant Recognition Model for protein sequence analysis

Annu Int Conf IEEE Eng Med Biol Soc. 2011:2011:4955-8. doi: 10.1109/IEMBS.2011.6091228.

Abstract

Signal processing techniques such as Fourier Transform have widely been studied and successfully applied in many different areas. Techniques such as zero-padding and windowing have been developed and found very useful to improve the outcome of the signal processing methods. Resonant Recognition Model (RRM) and Complex Resonant Recognition Model (CRRM) that are based on the discrete Fourier Transform and widely used for the analysis of protein sequences do not consider such methods, which can however improve or alter the features extracted from the protein sequences. Therefore, in this paper, an extensive analysis was carried out to investigate into the influence of the zero-padding and windowing on the features extracted from the Complex Resonant Recognition Model. In order to present such effects, five different classes of influenza A virus Neuraminidase genes, which include H1N1, H1N2, H2N2, H3N2 and H5N1 genes, were used as a case study. For each of the Influenza A subtypes, two sets of Common Frequency Peaks (CFP) were extracted, one where windowing is applied and the other one where windowing is suppressed, for each signal length set for the analysis. In order to make all the signals (protein sequence) the same length, zero-padding was used. The signal lengths used in this study are set to 470, which is the maximum protein length, and also 512, 1024, 2048, 4096, 8192 and 16384 for further analysis. The results suggest that the windowing and zero-padding have key impact on CFP extracted from the Influenza A subtypes as the best match with CFP extracted from influenza A subtypes using CRRM is when the signal length of 4096 and windowing were both applied. Therefore, the outcome of this study should be taken into consideration for more accurate and reliable analysis of the protein sequences.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Influenza A virus / metabolism*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods*
  • Viral Proteins / chemistry*
  • Viral Proteins / metabolism*

Substances

  • Viral Proteins