Analyzing relationships between latent topics in autonomous vehicle crash narratives and crash severity using natural language processing techniques and explainable XGBoost

Accid Anal Prev. 2024 Aug:203:107605. doi: 10.1016/j.aap.2024.107605. Epub 2024 May 13.

Abstract

Safety is one of the most essential considerations when evaluating the performance of autonomous vehicles (AVs). Real-world AV data, including trajectory, detection, and crash data, are becoming increasingly popular as they provide possibilities for a realistic evaluation of AVs' performance. While substantial research was conducted to estimate general crash patterns utilizing structured AV crash data, a comprehensive exploration of AV crash narratives remains limited. These narratives contain latent information about AV crashes that can further the understanding of AV safety. Therefore, this study utilizes the Structural Topic Model (STM), a natural language processing technique, to extract latent topics from unstructured AV crash narratives while incorporating crash metadata (i.e., the severity and year of crashes). In total, 15 topics are identified and are further divided into behavior-related, party-related, location-related, and general topics. Using these topics, AV crashes can be systematically described and clustered. Results from the STM suggest that AVs' abilities to interact with vulnerable road users (VRUs) and react to lane-change behavior need to be further improved. Moreover, an XGBoost model is developed to investigate the relationships between the topics and crash severity. The model significantly outperforms existing studies in terms of accuracy, suggesting that the extracted topics are closely related to crash severity. Results from interpreting the model indicate that topics containing information about crash severity and VRUs have significant impacts on the model's output, which are suggested to be included in future AV crash reporting.

Keywords: Autonomous vehicle safety; Machine learning; Natural language processing; Topic modeling.

MeSH terms

  • Accidents, Traffic*
  • Automobiles
  • Humans
  • Narration
  • Natural Language Processing*