Improving classification performance with discretization on biomedical datasets

AMIA Annu Symp Proc. 2008 Nov 6:2008:445-9.

Abstract

Discretization acts as a variable selection method in addition to transforming the continuous values of the variable to discrete ones. Machine learning algorithms such as Support Vector Machines and Random Forests have been used for classification in high-dimensional genomic and proteomic data due to their robustness to the dimensionality of the data. We show that discretization can help improve significantly the classification performance of these algorithms as well as algorithms like Naïve Bayes that are sensitive to the dimensionality of the data.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Database Management Systems*
  • Databases, Factual*
  • Decision Support Techniques*
  • Pattern Recognition, Automated / methods*