Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data

Appl Spectrosc. 2016 Feb;70(2):322-33. doi: 10.1177/0003702815620545.

Abstract

Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00%, independent test error of 17.00% and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00%, independent test error of 38.00%, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.

Keywords: Feature selection; Fusarium circinatum; Pinus; RF; Random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Decision Trees
  • Fusarium / chemistry*
  • Pinus / microbiology*
  • ROC Curve
  • Seedlings / microbiology*
  • Spectrum Analysis / methods*