Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology

J Theor Biol. 2014 Sep 7:356:213-22. doi: 10.1016/j.jtbi.2014.04.040. Epub 2014 May 10.

Abstract

Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.

Keywords: Lipid metabolism; Machine learning; Protein features; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Carrier Proteins* / chemistry
  • Carrier Proteins* / classification
  • Carrier Proteins* / genetics
  • Humans
  • Lipids*
  • Models, Genetic*
  • Neural Networks, Computer*
  • Protein Binding
  • Sequence Analysis, Protein / methods*
  • Support Vector Machine*

Substances

  • Carrier Proteins
  • Lipids