Information Extraction from Medical Texts with BERT Using Human-in-the-Loop Labeling

Stud Health Technol Inform. 2023 May 18:302:831-832. doi: 10.3233/SHTI230281.

Abstract

Neural network language models, such as BERT, can be used for information extraction from medical texts with unstructured free text. These models can be pre-trained on a large corpus to learn the language and characteristics of the relevant domain and then fine-tuned with labeled data for a specific task. We propose a pipeline using human-in-the-loop labeling to create annotated data for Estonian healthcare information extraction. This method is particularly useful for low-resource languages and is more accessible to those in the medical field than rule-based methods like regular expressions.

Keywords: BERT; information extraction; medical texts; named entity recognition; natural language processing.

MeSH terms

  • Health Facilities
  • Humans
  • Information Storage and Retrieval*
  • Language
  • Natural Language Processing*
  • Neural Networks, Computer