Geographically weighted random forests for macro-level crash frequency prediction

Accid Anal Prev. 2024 Jan:194:107370. doi: 10.1016/j.aap.2023.107370. Epub 2023 Nov 6.

Abstract

Machine learning models such as random forests (RF) have been widely applied in the field of road safety. RF is a prominent algorithm, overcoming the limitations of using a single decision tree such as overfitting and instability. However, the traditional RF is a global concept, and thus may fail to capture spatial variability. In macro-level analysis of road safety, the relationship between crash frequency and risk factors can vary spatially. To address this issue, we employ a modified RF algorithm, named geographically weighted random forest (GWRF). Based on the data from London at the level of Middle-super-output-area (MSOA), the predictive performances of RF and GWRF are compared using mean absolute error (MAE) and root mean square error (RMSE). Moreover, considering MSOAs are geographically connected with each other, several factors related to the discrepancies between adjacent zones are also included in the models. Our results indicate that GWRF outperforms the traditional RF and GWR when an appropriate bandwidth is selected. We further explore the effects of multicollinearity on model performance. The results show that prediction accuracy of GWRF models are not susceptible to the multicollinearity. However, the importance values of those variables with multicollinearity may reduce. Finally, and of equal importance, it is found that the importance of each explanatory variable varies across zones. The density of minor road makes the highest contribution to crash frequency in downtown area, while the crash frequency in peripheral area is more sensitive to the discrepancy of road environment between MSOAs. With such information, road safety interventions can be designed and implemented according to the locally important factors, avoiding thus general guidelines addressed for the entire city.

Keywords: Crash frequency prediction; Random forest; Road traffic safety; Spatial analysis.

MeSH terms

  • Accidents, Traffic* / prevention & control
  • Algorithms
  • Humans
  • Models, Statistical*
  • Random Forest
  • Risk Factors