A large dataset of annotated incident reports on medication errors

Zoie S Y Wong; Neil Waters; Jiaxing Liu; Shin Ushiro

doi:10.1038/s41597-024-03036-2

A large dataset of annotated incident reports on medication errors

Sci Data. 2024 Feb 29;11(1):260. doi: 10.1038/s41597-024-03036-2.

Authors

Zoie S Y Wong^{1

2}, Neil Waters³, Jiaxing Liu⁴, Shin Ushiro^{5

6}

Affiliations

¹ Graduate School of Public Health, St. Luke's International University, 3-6-2 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan. zoiesywong@gmail.com.
² School of Medical Sciences, The University of Sydney, Camperdown, NSW, 2006, Australia. zoiesywong@gmail.com.
³ Graduate School of Public Health, St. Luke's International University, 3-6-2 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
⁴ School of Statistics and Mathematics, Zhongnan University of Economics and Law, Nanhu Blvd, Wuhan, Hubei, 430073, China.
⁵ Division of Patient Safety, Kyushu University Hospital, 3-1-1 Maidashi, Higashi-ku, Fukuoka, 812-8582, Japan.
⁶ Japan Council for Quality Health Care (JQ), 1-4-17, Toyo Bldg., Kandamisaki-cho, Chiyoda-ku, Tokyo, 101-0061, Japan.

Abstract

Incident reports of medication errors are valuable learning resources for improving patient safety. However, pertinent information is often contained within unstructured free text, which prevents automated analysis and limits the usefulness of these data. Natural language processing can structure this free text automatically and retrieve relevant past incidents and learning materials, but to be able to do so requires a large, fully annotated and validated corpus of incident reports. We present a corpus of 58,658 machine-annotated incident reports of medication errors that can be used to advance the development of information extraction models and subsequent incident learning. We report the best F1-scores for the annotated dataset: 0.97 and 0.76 for named entity recognition and intention/factuality analysis, respectively, for the cross-validation exercise. Our dataset contains 478,175 named entities and differentiates between incident types by recognising discrepancies between what was intended and what actually occurred. We explain our annotation workflow and technical validation and provide access to the validation datasets and machine annotator for labelling future incident reports of medication errors.

Publication types

Dataset

MeSH terms

Information Storage and Retrieval*
Medication Errors*
Natural Language Processing