Implementation of a realistic artificial data generator for crash data generation

Accid Anal Prev. 2024 Jun:200:107566. doi: 10.1016/j.aap.2024.107566. Epub 2024 Apr 3.

Abstract

In this paper, a framework is outlined to generate realistic artificial data (RAD) as a tool for comparing different models developed for safety analysis. The primary focus of transportation safety analysis is on identifying and quantifying the influence of factors contributing to traffic crash occurrence and its consequences. The current framework of comparing model structures using only observed data has limitations. With observed data, it is not possible to know how well the models mimic the true relationship between the dependent and independent variables. Further, real datasets do not allow researchers to evaluate the model performance for different levels of complexity of the dataset. RAD offers an innovative framework to address these limitations. Hence, we propose a RAD generation framework embedded with heterogeneous causal structures that generates crash data by considering crash occurrence as a trip level event impacted by trip level factors, demographics, roadway and vehicle attributes. Within our RAD generator we employ three specific modules: (a) disaggregate trip information generation, (b) crash data generation and (c) crash data aggregation. For disaggregate trip information generation, we employ a daily activity-travel realization for an urban region generated from an established activity-based model for the Chicago region. We use this data of more than 2 million daily trips to generate a subset of trips with crash data. For trips with crashes crash location, crash type, driver/vehicle characteristics, and crash severity. The daily RAD generation process is repeated for generating crash records at yearly or multi-year resolution. The crash databases generated can be employed to compare frequency models, severity models, crash type and various other dimensions by facility type - possibly establishing a universal benchmarking system for alternative model frameworks in safety literature.

Keywords: Crash data generation; Realistic artificial data generation.

MeSH terms

  • Accidents, Traffic* / prevention & control
  • Chicago
  • Databases, Factual
  • Humans
  • Transportation*
  • Travel