Radial-Based Oversampling for Multiclass Imbalanced Data Classification

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.

Abstract

Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.