Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP

Accid Anal Prev. 2022 Mar:166:106545. doi: 10.1016/j.aap.2021.106545. Epub 2022 Jan 4.

Abstract

Understanding locally heterogeneous physical contexts in built environment is of great importance in developing preemptive countermeasures to mitigate pedestrian fatality risks. In this study, we aim to investigate the non-linear relationship between physical factors and pedestrian fatality at a location-specific level using a machine learning approach. The state-of-art machine learning algorithm, eXtreme Gradient Boosting (XGBoost), is employed for a binary classification problem, in which nationwide locations where fatal pedestrian accidents occurred for the years from 2012 to 2019 in Korea serve as positive samples (np = 13,366). For negative samples, locations with no pedestrian accidents are selected randomly to the size that is 10 times larger (nn = 133,660) than positive samples. Fifteen features under the categories of road conditions, road facilities, road networks, and land uses are assigned to both the positive and negative sample locations using Geographic Information System (GIS). A method is proposed to avoid the class imbalance problem, and a final unbiased model is utilized to predict fatal pedestrian risks at the negative sample locations. In addition, Shapley Additive Explanations (SHAP) is introduced to provide a robust interpretation of the XGBoos prediction results. It is shown that 21.6% of the negative sample locations have a probability of fatal pedestrian accidents greater than 0.5 (or 78.4% accuracy). Generally, a road segment that lies in many of the shortest routes in a dense residential area with many lively activities from aligned buildings is a potential spot for fatal pedestrian accidents. However, based on the SHAP interpretation, the relationships between the features and pedestrian fatality are found nonlinear and locally heterogeneous. We discuss the implications of this result has for drafting policy recommendations to reduce pedestrian fatalities.

Keywords: Built environment; Fatal pedestrian accidents; Local heterogeneity; SHAP; XGBoost.

MeSH terms

  • Accidents, Traffic
  • Built Environment
  • Geographic Information Systems
  • Humans
  • Machine Learning
  • Pedestrians*