Machine learning approaches for estimation of prediction interval for the model output

Durga L Shrestha; Dimitri P Solomatine

doi:10.1016/j.neunet.2006.01.012

Machine learning approaches for estimation of prediction interval for the model output

Neural Netw. 2006 Mar;19(2):225-35. doi: 10.1016/j.neunet.2006.01.012. Epub 2006 Mar 10.

Authors

Durga L Shrestha¹, Dimitri P Solomatine

Affiliation

¹ Department of Hydroinformatics and Knowledge Management, UNESCO-IHE Institute for Water Education, P.O. Box 3015, 2601 DA Delft, The Netherlands. d.shrestha@unesco-ihe.org

PMID: 16530384
DOI: 10.1016/j.neunet.2006.01.012

Abstract

A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence*
Cluster Analysis
Computer Simulation*
Ecosystem
Evaluation Studies as Topic
Fuzzy Logic
Neural Networks, Computer*
Nonlinear Dynamics
Predictive Value of Tests*
Reproducibility of Results
Time