Application of classification algorithms for analysis of road safety risk factor dependencies

Accid Anal Prev. 2015 Feb:75:1-15. doi: 10.1016/j.aap.2014.11.005. Epub 2014 Nov 16.

Abstract

Transportation continues to be an integral part of modern life, and the importance of road traffic safety cannot be overstated. Consequently, recent road traffic safety studies have focused on analysis of risk factors that impact fatality and injury level (severity) of traffic accidents. While some of the risk factors, such as drug use and drinking, are widely known to affect severity, an accurate modeling of their influences is still an open research topic. Furthermore, there are innumerable risk factors that are waiting to be discovered or analyzed. A promising approach is to investigate historical traffic accident data that have been collected in the past decades. This study inspects traffic accident reports that have been accumulated by the California Highway Patrol (CHP) since 1973 for which each accident report contains around 100 data fields. Among them, we investigate 25 fields between 2004 and 2010 that are most relevant to car accidents. Using two classification methods, the Naive Bayes classifier and the decision tree classifier, the relative importance of the data fields, i.e., risk factors, is revealed with respect to the resulting severity level. Performances of the classifiers are compared to each other and a binary logistic regression model is used as the basis for the comparisons. Some of the high-ranking risk factors are found to be strongly dependent on each other, and their incremental gains on estimating or modeling severity level are evaluated quantitatively. The analysis shows that only a handful of the risk factors in the data dominate the severity level and that dependency among the top risk factors is an imperative trait to consider for an accurate analysis.

Keywords: Decision tree classifier; Dependency; Naive Bayes classifier; Risk factor; Severity level; Traffic accident analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Accidents, Traffic / classification*
  • Accidents, Traffic / mortality
  • Accidents, Traffic / statistics & numerical data*
  • Algorithms*
  • Bayes Theorem
  • California
  • Decision Trees
  • Humans
  • Logistic Models
  • ROC Curve
  • Risk Factors
  • Safety*