Information Extraction from Medical Texts with BERT Using Human-in-the-Loop Labeling

Hendrik Šuvalov; Sven Laur; Raivo Kolde

doi:10.3233/SHTI230281

Information Extraction from Medical Texts with BERT Using Human-in-the-Loop Labeling

Stud Health Technol Inform. 2023 May 18:302:831-832. doi: 10.3233/SHTI230281.

Authors

Hendrik Šuvalov¹, Sven Laur¹, Raivo Kolde¹

Affiliation

¹ University of Tartu, Estonia.

PMID: 37203510
DOI: 10.3233/SHTI230281

Abstract

Neural network language models, such as BERT, can be used for information extraction from medical texts with unstructured free text. These models can be pre-trained on a large corpus to learn the language and characteristics of the relevant domain and then fine-tuned with labeled data for a specific task. We propose a pipeline using human-in-the-loop labeling to create annotated data for Estonian healthcare information extraction. This method is particularly useful for low-resource languages and is more accessible to those in the medical field than rule-based methods like regular expressions.

Keywords: BERT; information extraction; medical texts; named entity recognition; natural language processing.

MeSH terms

Health Facilities
Humans
Information Storage and Retrieval*
Language
Natural Language Processing*
Neural Networks, Computer