Recognition for avian influenza virus proteins based on support vector machine and linear discriminant analysis

Sci China B Chem. 2008;51(2):166-170. doi: 10.1007/s11426-008-0006-7.

Abstract

Total 200 properties related to structural characteristics were employed to represent structures of 400 HA coded proteins of influenza virus as training samples. Some recognition models for HA proteins of avian influenza virus (AIV) were developed using support vector machine (SVM) and linear discriminant analysis (LDA). The results obtained from LDA are as follows: the identification accuracy (R ia) for training samples is 99.8% and R ia by leave one out cross validation is 99.5%. Both R ia of 99.8% for training samples and R ia of 99.3% by leave one out cross validation are obtained using SVM model, respectively. External 200 HA proteins of influenza virus were used to validate the external predictive power of the resulting model. The external R ia for them is 95.5% by LDA and 96.5% by SVM, respectively, which shows that HA proteins of AIVs are preferably recognized by SVM and LDA, and the performances by SVM are superior to those by LDA.

Keywords: HA protein; avian influenza virus (AIV); linear discriminant analysis (LDA); support vector machine (SVM).