Evaluation of clinical named entity recognition methods for Serbian electronic health records

Aleksandar Kaplar; Milan Stošović; Aleksandra Kaplar; Voin Brković; Radomir Naumović; Aleksandar Kovačević

doi:10.1016/j.ijmedinf.2022.104805

Evaluation of clinical named entity recognition methods for Serbian electronic health records

Int J Med Inform. 2022 Aug:164:104805. doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.

Authors

Aleksandar Kaplar¹, Milan Stošović², Aleksandra Kaplar¹, Voin Brković², Radomir Naumović², Aleksandar Kovačević³

Affiliations

¹ Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia.
² Clinic of Nephrology, University Clinical Center of Serbia, Belgrade, Serbia.
³ Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia. Electronic address: kocha78@uns.ac.rs.

PMID: 35653828
DOI: 10.1016/j.ijmedinf.2022.104805

Abstract

Background and objectives: The importance of clinical natural language processing (NLP) has increased with the adoption of electronic health records (EHRs). One of the critical tasks in clinical NLP is named entity recognition (NER). Clinical NER in the Serbian language is a severely under-researched area. The few approaches that have been proposed so far are based on rules or machine-learning models with hand-crafted features, while current state-of-the-art models have not been explored. The objective of this paper is to assess the performance of state-of-the-art NER methods on clinical narratives in the Serbian language.

Materials and methods: We designed an experimental setup for a comprehensive evaluation of state-of-the-art NER models. The gold standard corpus we used for the evaluation is comprised of discharge summaries from the Clinic for Nephrology at the University Clinical Center of Serbia. The following models were evaluated: conditional random fields (CRF), multilingual transformers (BERT Multilingual and XLM RoBERTa), and long short-term memory (LSTM) recurrent neural networks, and their ensembles. In addition, we investigated the necessity of the pretraining task of transformer based models and the use of pretrained word embeddings with LSTM model.

Results: Our results show that individually CRF had the best precision, the pretrained BERT Multilingual model had the best recall values, and the LSTM model had the best F1 score. The best performance was achieved by combining the existing models in a majority voting ensemble with an F1 score of 0.892. The presented results are similar to the inter annotator agreement on our gold standard corpus and are comparable to existing state-of-the-art results for clinical NER reported in literature.

Conclusion: Existing state-of-the-art models can provide viable results for clinical named entity recognition when applied to languages with the complexity of the Serbian language without major modifications.

Keywords: BERT; Clinical named entity recognition; Electronic health records; Serbian language; Transformers.

MeSH terms

Electronic Health Records*
Humans
Machine Learning
Natural Language Processing*
Neural Networks, Computer
Serbia