A multivariate analysis of environmental effects on road accident occurrence using a balanced bagging approach

Accid Anal Prev. 2020 Mar:136:105398. doi: 10.1016/j.aap.2019.105398. Epub 2019 Dec 17.

Abstract

Determining and understanding the environmental factors contributing to road traffic accident occurrence is of core importance in road safety research. In this study, a methodology to obtain robust and unbiased results when modeling imbalanced, high-resolution accident data is described. Based on a data set covering the whole highway network of Austria in a fine spatial (250 m) and temporal (1 h) scale, the effects of 48 covariates on accident occurrence are analyzed, with a special emphasis on real-time weather variables obtained through meteorological re-analysis. A balanced bagging approach is employed to cope with the issue of class imbalance. By fitting different tree-based classifiers to a large number of bootstrapped training samples, ensembles of binary classification models are established. The final prediction is achieved through majority vote across each ensemble, resulting in a robust prediction with reduced variance. Findings show the merits of the proposed approach in terms of model quality and robustness of the results, consistently displaying accuracies around 80% while exhibiting sensitivities of approximately 50%. In addition to certain features related to roadway geometrics, surface condition and traffic volume, a number of weather variables are found to be of importance for predicting accident occurrence. The proposed methodological take may not only pave the way for further analyses of high-resolution road safety data including real-time information, but can also be transferred to any other imbalanced classification problem.

Keywords: Accident analysis; Adverse weather effects; Balanced bagging; Binary classification; Imbalanced data; Random forest; Road safety; xgBoost.

MeSH terms

  • Accidents, Traffic / statistics & numerical data*
  • Austria
  • Built Environment
  • Forecasting / methods
  • Humans
  • Multivariate Analysis
  • Safety
  • Spatial Analysis
  • Weather*