Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.

Abstract

In this article, we propose a novel entropy and confidence-based undersampling boosting (ECUBoost) framework to solve imbalanced problems. The boosting-based ensemble is combined with a new undersampling method to improve the generalization performance. To avoid losing informative samples during the data preprocessing of the boosting-based ensemble, both confidence and entropy are used in ECUBoost as benchmarks to ensure the validity and structural distribution of the majority samples during the undersampling. Furthermore, different from other iterative dynamic resampling methods, ECUBoost based on confidence can be applied to algorithms without iterations such as decision trees. Meanwhile, random forests are used as base classifiers in ECUBoost. Furthermore, experimental results on both artificial data sets and KEEL data sets prove the effectiveness of the proposed method.