A hybrid cost-sensitive ensemble for imbalanced breast thermogram classification

Artif Intell Med. 2015 Nov;65(3):219-27. doi: 10.1016/j.artmed.2015.07.005. Epub 2015 Jul 31.

Abstract

Objectives: Early recognition of breast cancer, the most commonly diagnosed form of cancer in women, is of crucial importance, given that it leads to significantly improved chances of survival. Medical thermography, which uses an infrared camera for thermal imaging, has been demonstrated as a particularly useful technique for early diagnosis, because it detects smaller tumors than the standard modality of mammography.

Methods and material: In this paper, we analyse breast thermograms by extracting features describing bilateral symmetries between the two breast areas, and present a classification system for decision making. Clearly, the costs associated with missing a cancer case are much higher than those for mislabelling a benign case. At the same time, datasets contain significantly fewer malignant cases than benign ones. Standard classification approaches fail to consider either of these aspects. In this paper, we introduce a hybrid cost-sensitive classifier ensemble to address this challenging problem. Our approach entails a pool of cost-sensitive decision trees which assign a higher misclassification cost to the malignant class, thereby boosting its recognition rate. A genetic algorithm is employed for simultaneous feature selection and classifier fusion. As an optimisation criterion, we use a combination of misclassification cost and diversity to achieve both a high sensitivity and a heterogeneous ensemble. Furthermore, we prune our ensemble by discarding classifiers that contribute minimally to the decision making.

Results: For a challenging dataset of about 150 thermograms, our approach achieves an excellent sensitivity of 83.10%, while maintaining a high specificity of 89.44%. This not only signifies improved recognition of malignant cases, it also statistically outperforms other state-of-the-art algorithms designed for imbalanced classification, and hence provides an effective approach for analysing breast thermograms.

Conclusions: Our proposed hybrid cost-sensitive ensemble can facilitate a highly accurate early diagnostic of breast cancer based on thermogram features. It overcomes the difficulties posed by the imbalanced distribution of patients in the two analysed groups.

Keywords: Breast cancer detection; Classifier ensemble; Cost-sensitive classification; Ensemble pruning; Evolutionary algorithm; Imbalanced classification; Thermogram.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Breast Neoplasms / diagnosis*
  • Cost-Benefit Analysis
  • Decision Trees*
  • Diagnosis, Computer-Assisted / methods*
  • False Negative Reactions
  • False Positive Reactions
  • Female
  • Humans
  • Sensitivity and Specificity
  • Thermography / economics*
  • Thermography / methods*