Bagged k-nearest neighbours classification with uncertainty in the variables

Anal Chim Acta. 2009 Jul 30;646(1-2):62-8. doi: 10.1016/j.aca.2009.05.016. Epub 2009 May 21.

Abstract

An analytical result should be expressed as x+/-U, where x is the experimental result obtained for a given variable and U is its uncertainty. This uncertainty is rarely taken into account in supervised classification. In this paper, we propose to include the information about the uncertainty of the experimental results to compute the reliability of classification. The method combines k-nearest neighbours (kNN) with a nested bootstrap scheme, in which a new bootstrap training set is generated using the classical bootstrap in the first level (B times) and a new bootstrap method, called U-bootstrap, in the second level (D times). Two bootstraps are used to reduce the effect of sampling in the first level and the effect of the uncertainty in the second one. These BxD new training bootstrap sets are used to compute the reliability of classification for an unknown object using kNN. The object is classified into the class with the highest reliability. In this method, unlike the classical kNN and Probabilistic Bagged k-nearest neighbours (PBkNN), the reliability of classification changes (increases or decreases) when the uncertainty is increased. These changes depend on the position of the unknown object with respect to the training objects. For the benchmark Wine dataset, we found similar values of classification error rate (CER) than for kNN (5.57%), but lower than Probabilistic Bagged k-nearest neighbours using Hamamoto's bootstrap (7.96%) or Efron's bootstrap (8.97%).