Automatically disambiguating medical acronyms with ontology-aware deep learning

Marta Skreta; Aryan Arbabi; Jixuan Wang; Erik Drysdale; Jacob Kelly; Devin Singh; Michael Brudno

doi:10.1038/s41467-021-25578-4

Automatically disambiguating medical acronyms with ontology-aware deep learning

Nat Commun. 2021 Sep 7;12(1):5319. doi: 10.1038/s41467-021-25578-4.

Authors

Marta Skreta^{1

2

3

4}, Aryan Arbabi^{5

6

7

8}, Jixuan Wang^{5

6

7

8}, Erik Drysdale⁹, Jacob Kelly^{5

8}, Devin Singh^{5

7}, Michael Brudno^{10

11

12

13}

Affiliations

¹ Department of Computer Science, University of Toronto, Toronto, Canada. martaskreta@cs.toronto.edu.
² DATA Team & Techna Institute, University Health Network, Toronto, Canada. martaskreta@cs.toronto.edu.
³ Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada. martaskreta@cs.toronto.edu.
⁴ Vector Institute for Artificial Intelligence, Toronto, Canada. martaskreta@cs.toronto.edu.
⁵ Department of Computer Science, University of Toronto, Toronto, Canada.
⁶ DATA Team & Techna Institute, University Health Network, Toronto, Canada.
⁷ Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada.
⁸ Vector Institute for Artificial Intelligence, Toronto, Canada.
⁹ The Hospital for Sick Children, Toronto, Canada.
¹⁰ Department of Computer Science, University of Toronto, Toronto, Canada. brudno@cs.toronto.edu.
¹¹ DATA Team & Techna Institute, University Health Network, Toronto, Canada. brudno@cs.toronto.edu.
¹² Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Canada. brudno@cs.toronto.edu.
¹³ Vector Institute for Artificial Intelligence, Toronto, Canada. brudno@cs.toronto.edu.

Abstract

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; however, training them requires extensive hand-labelled datasets. Disambiguating abbreviations is important for automated clinical note processing; however, broad deployment of ML for this task is restricted by the scarcity and imbalance of labeled training data. In this work we present a method that improves a model's ability to generalize through novel data augmentation techniques that utilizes information from biomedical ontologies in the form of related medical concepts, as well as global context information within the medical note. We train our model on a public dataset (MIMIC III) and test its performance on automatically generated and hand-labelled datasets from different sources (MIMIC III, CASI, i2b2). Together, these techniques boost the accuracy of abbreviation disambiguation by up to 17% on hand-labeled data, without sacrificing performance on a held-out test set from MIMIC III.

MeSH terms

Biomedical Research
Data Mining / methods*
Datasets as Topic
Deep Learning*
Humans
Terminology as Topic*