Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability

J Safety Res. 2022 Feb:80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.

Abstract

Introduction: Ensuring occupational health and safety is an enormous concern for organizations, as accidents not only harm workers but also result in financial losses. Analysis of accident data has the potential to reveal insights that may improve capabilities to mitigate future accidents. However, because accident data are often transcribed textually, analyzing these narratives proves difficult. This study contributes to a recent stream of literature utilizing machine learning to automatically label accident narratives, converting them into more easily analyzable fields.

Method: First, a large dataset of accident narratives in which workers were injured is collected from the U.S. Occupational Safety and Health Administration (OSHA). Word embeddings-based text mining is implemented; compared to past works, this methodology offers excellent performance. Second, to improve the richness of analyses, each record is assessed across five dimensions. The machine learning models provide classifications of body part(s) injured, the source of the injury, the type of event causing the injury, whether a hospitalization occurred, and whether an amputation occurred. Finally, demonstrating generalizability, the trained models are deployed to analyze two additional datasets of accident narratives in the construction industry and the mining and metals industry (transfer learning). Practical Applications: These contributions improve organizations' capacities to rapidly analyze textual accident narratives.

Keywords: Machine learning; Occupational safety; Text mining; Transfer learning; Word embeddings.

MeSH terms

  • Accidents
  • Construction Industry*
  • Data Mining
  • Humans
  • Machine Learning
  • Occupational Health*