Examining different cost ratio frameworks for decision rule machine learning algorithms in diagnostic application

Technol Health Care. 2024 Feb 7. doi: 10.3233/THC-231946. Online ahead of print.

Abstract

Background: Artificial Intelligence (AI) plays a pivotal role in the diagnosis of health conditions ranging from general well-being to critical health issues. In the realm of health diagnostics, an often overlooked but critical aspect is the consideration of cost-sensitive learning, a facet that this study prioritizes over the non-invasive nature of the diagnostic process whereas the other standard metrics such as accuracy and sensitivity reflect weakness in error profile.

Objective: This research aims to investigate the total cost of misclassification (Total Cost) by decision rule Machine Learning (ML) algorithms implemented in Java platforms such as DecisionTable, JRip, OneR, and PART. An augmented dataset with conjunctiva images along candidates' demographic and anthropometric features under supervised learning is considered with a specific emphasis on cost-sensitive classification.

Methods: The opted decision rule classifiers use the text features, additionally the image feature 'a* value of CIELAB color space' extracted from the conjunctiva digital images as input attributes. The pre-processing consists of amalgamating text and image features on a uniform scale, normalizing. Then the 10-fold cross-validation enables the classification of samples into two categories: the presence or absence of the anemia. This study utilizes the Cost Ratio (ρ) extracted from the cost matrix to meticulously monitor the Total Cost in four different cost ratio methodologies namely Uniform (U), Uniform Inverted (UI), Non-Uniform (NU), and Non-Uniform Inverted (NUI).

Results: It has been established that the PART classifier stands out as the top performer in this binary classification task, yielding the lowest mean total cost of 629.9 compared to other selected classifiers. Moreover, it demonstrates a comparatively lower standard deviation 335.9, and lower total cost across all four different cost ratio methodologies. The ranking of algorithm performance goes as follows: PART, JRIP, DecisionTable, and OneR.

Conclusion: The significance of adopting a cost-sensitive learning approach is emphasized showing the PART classifier's consistent performance within the proposed framework for learning the anemia dataset. This emphasis on cost-sensitive learning not only enhances the recommendations in diagnosis but also holds the potential for substantial cost savings and makes it a noteworthy focal point in the advancement of AI-driven health care.

Keywords: Cost matrix; DecisionTable; JRip; OneR; PART; cost ratio; cost sensitive classifier; decision rule classifier.