Using T3, an improved decision tree classifier, for mining stroke-related medical data

Methods Inf Med. 2007;46(5):523-9. doi: 10.1160/me0317.

Abstract

Objectives: Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical management and investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses.

Method: We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier.

Results: T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively.

Conclusions: This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.

MeSH terms

  • Algorithms
  • Decision Support Techniques
  • Decision Trees*
  • Female
  • Humans
  • Information Management
  • Information Storage and Retrieval / methods*
  • Information Systems
  • Knowledge Bases
  • Male
  • Models, Statistical
  • Stroke*