Identifying high crash risk segments in rural roads using ensemble decision tree-based models

Sci Rep. 2022 Nov 21;12(1):20024. doi: 10.1038/s41598-022-24476-z.

Abstract

Traffic safety forecast models are mainly used to rank road segments. While existing studies have primarily focused on identifying segments in urban networks, rural networks have received less attention. However, rural networks seem to have a higher risk of severe crashes. This paper aims to analyse traffic crashes on rural roads to identify the influencing factors on the crash frequency and present a framework to develop a spatial-temporal crash risk map to prioritise high-risk segments on different days. The crash data of Khorasan Razavi province is used in this study. Crash frequency data with the temporal resolution of one day and spatial resolution of 1500 m from loop detectors are analysed. Four groups of influential factors, including traffic parameters (e.g. traffic flow, speed, time headway), road characteristics (e.g. road type, number of lanes), weather data (e.g. daily rainfall, snow depth, temperature), and calendar variables (e.g. day of the week, public holidays, month, year) are used for model calibration. Three different decision tree algorithms, including, Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) have been employed to predict crash frequency. Results show that based on the traditional evaluation measures, the XGBosst is better for the explanation and interpretation of the factors affecting crash frequency, while the RF model is better for detecting trends and forecasting crash frequency. According to the results, the traffic flow rate, road type, year of the crash, and wind speed are the most influencing variables in predicting crash frequency on rural roads. Forecasting the high and medium risk segment-day in the rural network can be essential to the safety management plan. This risk will be sensitive to real traffic data, weather forecasts and road geometric characteristics. Seventy percent of high and medium risk segment-day are predicted for the case study.

MeSH terms

  • Accidents, Traffic*
  • Decision Trees
  • Humans
  • Rural Population*
  • Safety
  • Weather