Bayes classifiers for imbalanced traffic accidents datasets

Accid Anal Prev. 2016 Mar:88:37-51. doi: 10.1016/j.aap.2015.12.003. Epub 2015 Dec 20.

Abstract

Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.

Keywords: Bayesian networks; Imbalanced data set; SMOTE; Traffic accidents; Urban area.

MeSH terms

  • Accidents, Traffic / mortality
  • Accidents, Traffic / statistics & numerical data*
  • Algorithms*
  • Bayes Theorem
  • Cities
  • Datasets as Topic*
  • Environment Design
  • Humans
  • Jordan / epidemiology
  • Trauma Severity Indices
  • Weather
  • Wounds and Injuries / epidemiology*
  • Wounds and Injuries / mortality