Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach

PeerJ Comput Sci. 2022 Mar 7:8:e913. doi: 10.7717/peerj-cs.913. eCollection 2022.

Abstract

Detecting negation and uncertainty is crucial for medical text mining applications; otherwise, extracted information can be incorrectly identified as real or factual events. Although several approaches have been proposed to detect negation and uncertainty in clinical texts, most efforts have focused on the English language. Most proposals developed for Spanish have focused mainly on negation detection and do not deal with uncertainty. In this paper, we propose a deep learning-based approach for both negation and uncertainty detection in clinical texts written in Spanish. The proposed approach explores two deep learning methods to achieve this goal: (i) Bidirectional Long-Short Term Memory with a Conditional Random Field layer (BiLSTM-CRF) and (ii) Bidirectional Encoder Representation for Transformers (BERT). The approach was evaluated using NUBES and IULA, two public corpora for the Spanish language. The results obtained showed an F-score of 92% and 80% in the scope recognition task for negation and uncertainty, respectively. We also present the results of a validation process conducted using a real-life annotated dataset from clinical notes belonging to cancer patients. The proposed approach shows the feasibility of deep learning-based methods to detect negation and uncertainty in Spanish clinical texts. Experiments also highlighted that this approach improves performance in the scope recognition task compared to other proposals in the biomedical domain.

Keywords: Clinical texts; Deep learning; Natural Language Processing; Negation and Uncertainty detection; Text mining.

Grants and funding

This paper is supported by European Union’s Horizon 2020 research and innovation program under grant agreement No. 875160, project CLARIFY (Cancer Long Survivors Artificial Intelligence Follow Up). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.