A hybrid clustering and random forest model to analyse vulnerable road user to motor vehicle (VRU-MV) crashes

Int J Inj Contr Saf Promot. 2023 Sep;30(3):338-351. doi: 10.1080/17457300.2023.2180804. Epub 2023 Feb 22.

Abstract

The main goal of this study is to investigate the unobserved heterogeneity in VRU-MV crash data and to determine the relatively important contributing factors of injury severity. For this end, a latent class analysis (LCA) coupled with random parameters logit model (LCA-RPL) is developed to segment the VRU-MV crashes into relatively homogeneous clusters and to explore the differences among clusters. The random-forest-based SHapley Additive exPlanation (RF-SHAP) approach is used to explore the relative importance of the contributing factors for injury severity in each cluster. The results show that, vulnerable group (VG), intersection or not (ION) and road type (RT) clearly distinguish the crash clusters. Moto-vehicle type and functional zone have significant impact on the injury severity among all clusters. Several variables (e.g. ION, crash type [CT], season and RT) demonstrate a significant effect in a specific sub-cluster model. Results of this study provide specific and insightful countermeasures that target the contributing factors in each cluster for mitigating VRU-MV crash injury severity.

Keywords: Injury severity; SHapley Additive exPlanation; latent class analysis; random forest; unobserved heterogeneity; vulnerable road user.

MeSH terms

  • Accidents, Traffic*
  • Cluster Analysis
  • Humans
  • Logistic Models
  • Motor Vehicles
  • Random Forest*