Prediction and analysis of likelihood of freeway crash occurrence considering risky driving behavior

Accid Anal Prev. 2023 Nov:192:107244. doi: 10.1016/j.aap.2023.107244. Epub 2023 Aug 11.

Abstract

The prediction of the likelihood of vehicle crashes constitutes an indispensable component of freeway safety management. Due to data collection limitations, studies have used mainly traffic flow-related variables to develop freeway crash prediction models but rarely have considered the effect of risky driving behavior on the likelihood of crashes. This study employed navigation software to collect driving behavior data and integrated multi-source data that include vehicle speed, traffic volume, and congestion index values. The study also employed the 'synthesizing minority oversampling technique and edited nearest neighbor' (SMOTE + ENN) coupled method for data balance processing. Three freeway crash likelihood prediction models were built based on the binomial logit, eXtreme Gradient Boosting (XGBoost), and support vector machine algorithms, respectively. The Shapley additive explanation (SHAP) algorithm was utilized to explore the effect of each feature variable on the likelihood of crashes. The results show that the prediction accuracy of the XGBoost model is the best of the three compared models. Under the optimal control-to-case ratio (1:1), the prediction accuracy of the XGBoost model reached 0.96 in this study, and the recall rate, specificity, and area-under-the-curve values were 0.86, 0.96, and 0.907, respectively. Comparative test results demonstrate that ranking risky driving behavior into three levels of intensity can effectively enhance the predictive accuracy of the XGBoost model. Moreover, the XGBoost model with its ten-minute time step outperformed the XGBoost model with its five-minute time step in terms of prediction accuracy. The results of the SHAP-based analysis show that the likelihood of highway crashes is high when the traffic congestion level is high and the distribution of the vehicle speed in the upstream roadway section is significant. Also, both sharp acceleration and sharp deceleration lead to greater likelihood of crashes. This paper aims to provide an effective framework for predicting and interpreting the likelihood of freeway crashes, thereby providing guidance for crash prevention, driver training, and the development of traffic regulations.

Keywords: Crash likelihood prediction; Freeway; Risky driving behavior; Shapley additive explanation (SHAP); XGBoost.

MeSH terms

  • Accidents, Traffic / prevention & control
  • Algorithms
  • Automobile Driving*
  • Humans
  • Probability
  • Safety Management