Medical data set classification using a new feature selection algorithm combined with twin-bounded support vector machine

Márcio Dias de Lima; Juliana de Oliveira Roque E Lima; Rommel M Barbosa

doi:10.1007/s11517-019-02100-z

Medical data set classification using a new feature selection algorithm combined with twin-bounded support vector machine

Med Biol Eng Comput. 2020 Mar;58(3):519-528. doi: 10.1007/s11517-019-02100-z. Epub 2020 Jan 4.

Authors

Márcio Dias de Lima^{1

2}, Juliana de Oliveira Roque E Lima³, Rommel M Barbosa⁴

Affiliations

¹ Instituto Federal de Educação, Ciência e Tecnologia de Goiás, R. 75 - St. Central, Goiânia, GO, CEP 74055-110, Brazil.
² Instituto de Informática, Universidade Federal de Goiás, Alameda Palmeiras, Quadra D, Câmpus Samambaia, Goiânia, GO, CEP 74690-900, Brazil.
³ Faculdade de Enfermagem, Universidade Federal de Goiás, Rua 227 Qd 68, S/N - Setor Leste Universitário, Goiânia, GO, CEP 74605-080, Brazil.
⁴ Instituto de Informática, Universidade Federal de Goiás, Alameda Palmeiras, Quadra D, Câmpus Samambaia, Goiânia, GO, CEP 74690-900, Brazil. rmbweb@gmail.com.

PMID: 31900818
DOI: 10.1007/s11517-019-02100-z

Abstract

Early diagnosis and treatment are the most important strategies to prevent deaths from several diseases. In this regard, data mining and machine learning techniques have been useful tools to help minimize errors and to provide useful information for diagnosis. Our paper aims to present a new feature selection algorithm. In order to validate our study, we used eight benchmark data sets which are commonly used among researchers who developed machine learning methods for medical data classification. The experiment has shown that the performance of our proposed new feature selection method combined with twin-bounded support vector machine (FSTBSVM) is very efficient. The robustness of the FSTBSVM is examined using classification accuracy, analysis of sensitivity, and specificity. The proposed FSTBSVM is a very promising technique for classification, and the results show that the proposed method is capable of producing good results with fewer features than the original data sets. Graphical abstract Model using a new feature selection and grid search with 10-fold CV to optimize model parameters in our FSTBSVM.

Keywords: Classification; Data mining; Feature selection; Medical data set; Twin-bounded support vector machine.

MeSH terms

Databases as Topic
Female
Humans
Neural Networks, Computer
Support Vector Machine*