Traffic accident duration prediction using text mining and ensemble learning on expressways

Sci Rep. 2022 Dec 12;12(1):21478. doi: 10.1038/s41598-022-25988-4.

Abstract

Predicting traffic accident duration is necessary for ensuring traffic safety. Several attempts have been made to achieve high prediction accuracy, but researchers have not considered traffic accident text data in much detail. The limited text data of the first report on an incident describes the characteristics of an accident that are initially available. This paper uses text data fusing and ensemble learning algorithms to build a model to predict an accident's duration, and a preprocessing scheme of accident duration text data is established. Next, the random forest (RF) algorithm is applied to select feature variables of text data related to the traffic incident duration. Last, a text feature vector is introduced to models such as decision tree, k nearest neighbor, support vector regression, random forest, Gradient Boosting Decision Tree, and Xtreme Gradient Boosting. Our results show that the improved RF model has good prediction accuracy with RMSE, MAPE and R2. From this, the textual factors important to determining the duration of the accident are identified. Further, we investigated that the cumulative importance of 60% is sufficient for traffic accident prediction using text data. These results provide insights into minimizing traffic congestion related to accidents and contribute to the input optimization in text prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Accidents, Traffic* / prevention & control
  • Algorithms
  • Data Mining*
  • Machine Learning