Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP

Accid Anal Prev. 2021 Sep:159:106261. doi: 10.1016/j.aap.2021.106261. Epub 2021 Jun 25.

Abstract

Understanding and quantifying the effects of risk factors on crash frequency is of great importance for developing cost-effective safety countermeasures. In this paper, the effects of key crash contributing factors on total crashes and crashes of different collision types are analyzed separately and compared. A novel Machine Learning (ML) method, Light Gradient Boosting Machine (LightGBM), is introduced to model a Texas dataset consisting of vehicle crashes occurred from 2015 to 2017. Compared with other commonly used ML methods such as eXtreme Gradient Boosting (XGBoost), LightGBM performs significantly better in terms of mean absolute error (MAE) and root mean squared error (RMSE). In addition, the SHapley Additive explanation (SHAP) approach is employed to interpret the LightGBM outputs. Significant risk factors are identified, including speed limits, area type, number of lanes, roadway functional class, shoulder width and shoulder type. With the SHAP method, the importance, total effects, and main and interaction effects of risk factors are quantified. The results suggest that the importance of risk factors vary across collision types. Speed limit is a more important risk factor than right/left shoulder width, lane width, and median width for Rear-End (RE) crashes, while the opposite relationship is found for Run-Off-Road (ROR) crashes. Also, it is found that narrow lanes (8ft to 11ft) increase the risk for all types of crashes (i.e., Total, ROR, and RE) in this study. For road segments with 5 or 6 lanes in both directions combined, a lane width greater than or equal to 12ft may help reduce the risk of all types of crashes. These results have important implications for developing accurate crash modification factors and cost-effective safety countermeasures.

Keywords: Crash frequency; Crash type; LightGBM; Machine learning; SHAP; Safety.

MeSH terms

  • Accidents, Traffic*
  • Humans
  • Risk Factors
  • Safety
  • Texas / epidemiology