An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

Loris Nanni; Alessandra Lumini

doi:10.1007/s00726-008-0044-7

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.

Authors

Loris Nanni¹, Alessandra Lumini

Affiliation

¹ DEIS, IEIIT--CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy. loris.nanni@unibo.it

PMID: 18288459
DOI: 10.1007/s00726-008-0044-7

Abstract

It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is approximately 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids / chemistry
Artificial Intelligence
Computational Biology / methods*
DNA-Binding Proteins / chemistry*
DNA-Binding Proteins / genetics
Databases, Protein
Models, Chemical
Sequence Analysis, Protein / methods*

Substances

Amino Acids
DNA-Binding Proteins