A Novel Classification Method: Neighborhood-Based Positive Unlabeled Learning Using Decision Tree (NPULUD)

Bita Ghasemkhani; Kadriye Filiz Balbal; Kokten Ulas Birant; Derya Birant

doi:10.3390/e26050403

A Novel Classification Method: Neighborhood-Based Positive Unlabeled Learning Using Decision Tree (NPULUD)

Entropy (Basel). 2024 May 4;26(5):403. doi: 10.3390/e26050403.

Authors

Bita Ghasemkhani¹, Kadriye Filiz Balbal², Kokten Ulas Birant^{3

4}, Derya Birant⁴

Affiliations

¹ Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey.
² Department of Computer Science, Dokuz Eylul University, Izmir 35390, Turkey.
³ Information Technologies Research and Application Center (DEBTAM), Dokuz Eylul University, Izmir 35390, Turkey.
⁴ Department of Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey.

Abstract

In a standard binary supervised classification task, the existence of both negative and positive samples in the training dataset are required to construct a classification model. However, this condition is not met in certain applications where only one class of samples is obtainable. To overcome this problem, a different classification method, which learns from positive and unlabeled (PU) data, must be incorporated. In this study, a novel method is presented: neighborhood-based positive unlabeled learning using decision tree (NPULUD). First, NPULUD uses the nearest neighborhood approach for the PU strategy and then employs a decision tree algorithm for the classification task by utilizing the entropy measure. Entropy played a pivotal role in assessing the level of uncertainty in the training dataset, as a decision tree was developed with the purpose of classification. Through experiments, we validated our method over 24 real-world datasets. The proposed method attained an average accuracy of 87.24%, while the traditional supervised learning approach obtained an average accuracy of 83.99% on the datasets. Additionally, it is also demonstrated that our method obtained a statistically notable enhancement (7.74%), with respect to state-of-the-art peers, on average.

Keywords: artificial intelligence; classification; decision tree; entropy measure; k-nearest neighbors; machine learning; positive unlabeled learning; supervised learning.

Grants and funding

This research received no external funding.