Bayesian networks for imbalance data to investigate the contributing factors to fatal injury crashes on the Ghanaian highways

Accid Anal Prev. 2021 Feb:150:105936. doi: 10.1016/j.aap.2020.105936. Epub 2020 Dec 17.

Abstract

The crash data are often predominantly imbalanced, among which the fatal injury (or minority) crashes are significantly underrepresented relative to the non-fatal injury (or majority) ones. This unbalanced phenomenon poses a huge challenge to most of the statistical learning methods and needs to be addressed in the data preprocessing. To this end, we comparatively apply three data balance methods, i.e., the Synthetic Minority Oversampling Technique (SMOTE), the Borderline SMOTE (BL-SMOTE), and the Majority Weighted Minority Oversampling (MWMOTE). Then, we examine different Bayesian networks (BNs) to explore the contributing factors of fatal injury crashes. The 2016 highway crash data of Ghana are retrieved for the case study. The results show that the accuracy of the injury severity classification is improved by using the preprocessed data. Highest improvement is observed on the data preprocessed by the MWMOTE technique. Statistical verification is done by the Wilcoxon signed-rank test. The inference results of the best BNs show the significant factors of fatal crashes which include off-peak time, non-intersection area, pedestrian involved collisions, rural road environment, good tarred road, roads without shoulders, and multiple vehicles involved crash.

Keywords: Bayesian networks; Classification; Crash injury severity; Imbalance data; Oversampling techniques.

MeSH terms

  • Accidents, Traffic
  • Bayes Theorem
  • Ghana / epidemiology
  • Humans
  • Pedestrians*
  • Rural Population
  • Wounds and Injuries* / epidemiology