Random forest construction with robust semisupervised node splitting

IEEE Trans Image Process. 2015 Jan;24(1):471-83. doi: 10.1109/TIP.2014.2378017. Epub 2014 Dec 4.

Abstract

Random forest (RF) is a very important classifier with applications in various machine learning tasks, but its promising performance heavily relies on the size of labeled training data. In this paper, we investigate constructing of RFs with a small size of labeled data and find that the performance bottleneck is located in the node splitting procedures; hence, existing solutions fail to properly partition the feature space if there are insufficient training data. To achieve robust node splitting with insufficient data, we present semisupervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and abundant unlabeled data. In particular, an accurate quality measure of node splitting is obtained by carrying out the kernel-based density estimation, whereby a multiclass version of asymptotic mean integrated squared error criterion is proposed to adaptively select the optimal bandwidth of the kernel. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyperplane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. Our algorithm efficiently avoids overfitting caused by bad initialization and local maxima when compared with conventional margin maximization-based semisupervised methods. We demonstrate the effectiveness of the proposed algorithm by comparing it with state-of-the-art supervised and semisupervised algorithms for typical computer vision applications, such as object categorization, face recognition, and image segmentation, on publicly available data sets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Biometric Identification
  • Decision Trees
  • Face / anatomy & histology
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Pattern Recognition, Automated / methods*