Prediction intervals with random forests

Stat Methods Med Res. 2020 Jan;29(1):205-229. doi: 10.1177/0962280219829885. Epub 2019 Feb 21.

Abstract

The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the classification and regression tree (CART) paradigm and the transformation forest method. For CART forests, in addition to the default least-squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated, and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data.

Keywords: Random forest; out-of-bag calibration; prediction interval; splitting rule.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Calibration
  • Computer Simulation
  • Forecasting
  • Humans
  • Machine Learning
  • Models, Statistical*
  • Research Design