Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability

David M Goldberg

doi:10.1016/j.jsr.2021.12.024

Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability

J Safety Res. 2022 Feb:80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.

Author

David M Goldberg¹

Affiliation

¹ San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, United States. Electronic address: dgoldberg@sdsu.edu.

PMID: 35249625
DOI: 10.1016/j.jsr.2021.12.024

Abstract

Introduction: Ensuring occupational health and safety is an enormous concern for organizations, as accidents not only harm workers but also result in financial losses. Analysis of accident data has the potential to reveal insights that may improve capabilities to mitigate future accidents. However, because accident data are often transcribed textually, analyzing these narratives proves difficult. This study contributes to a recent stream of literature utilizing machine learning to automatically label accident narratives, converting them into more easily analyzable fields.

Method: First, a large dataset of accident narratives in which workers were injured is collected from the U.S. Occupational Safety and Health Administration (OSHA). Word embeddings-based text mining is implemented; compared to past works, this methodology offers excellent performance. Second, to improve the richness of analyses, each record is assessed across five dimensions. The machine learning models provide classifications of body part(s) injured, the source of the injury, the type of event causing the injury, whether a hospitalization occurred, and whether an amputation occurred. Finally, demonstrating generalizability, the trained models are deployed to analyze two additional datasets of accident narratives in the construction industry and the mining and metals industry (transfer learning). Practical Applications: These contributions improve organizations' capacities to rapidly analyze textual accident narratives.

Keywords: Machine learning; Occupational safety; Text mining; Transfer learning; Word embeddings.

MeSH terms

Accidents
Construction Industry*
Data Mining
Humans
Machine Learning
Occupational Health*