Selective oversampling approach for strongly imbalanced data

Peter Gnip; Liberios Vokorokos; Peter Drotár

doi:10.7717/peerj-cs.604

Selective oversampling approach for strongly imbalanced data

PeerJ Comput Sci. 2021 Jun 18:7:e604. doi: 10.7717/peerj-cs.604. eCollection 2021.

Authors

Peter Gnip¹, Liberios Vokorokos¹, Peter Drotár¹

Affiliation

¹ Department of Computers and Informatics, Technical University of Košice, Slovak Republic.

Abstract

Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.

Keywords: ADASYN; Bankruptcy prediction; Imbalanced data; Outlier detection; Oversampling; SMOTE.

Grants and funding

This work was supported by the Slovak Research and Development Agency under contract no. APVV-16-0211 and by the Ministry of Education, Science, Research and Sport of the Slovak Republic under contract no. VEGA 1/0327/20. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.