Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network

Guillermo Argüello-González; José Aquino-Esperanza; Daniel Salvador; Rosa Bretón-Romero; Carlos Del Río-Bermudez; Jorge Tello; Sebastian Menke

doi:10.1186/s12911-023-02301-5

Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network

BMC Med Inform Decis Mak. 2023 Oct 13;23(1):216. doi: 10.1186/s12911-023-02301-5.

Authors

Guillermo Argüello-González^{1

2}, José Aquino-Esperanza^{1

3}, Daniel Salvador¹, Rosa Bretón-Romero⁴, Carlos Del Río-Bermudez⁴, Jorge Tello¹, Sebastian Menke⁵

Affiliations

¹ MedSavana SL, Madrid, 28004, Spain.
² Statistics and Operations Research, University of Oviedo, Oviedo, 33003, Spain.
³ Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, 08007, Spain.
⁴ Savana Research, Madrid, SL, 28004, Spain.
⁵ MedSavana SL, Madrid, 28004, Spain. smenke@savanamed.com.

Abstract

Background: Important clinical information of patients is present in unstructured free-text fields of Electronic Health Records (EHRs). While this information can be extracted using clinical Natural Language Processing (cNLP), the recognition of negation modifiers represents an important challenge. A wide range of cNLP applications have been developed to detect the negation of medical entities in clinical free-text, however, effective solutions for languages other than English are scarce. This study aimed at developing a solution for negation recognition in Spanish EHRs based on a combination of a customized rule-based NegEx layer and a convolutional neural network (CNN).

Methods: Based on our previous experience in real world evidence (RWE) studies using information embedded in EHRs, negation recognition was simplified into a binary problem ('affirmative' vs. 'non-affirmative' class). For the NegEx layer, negation rules were obtained from a publicly available Spanish corpus and enriched with custom ones, whereby the CNN binary classifier was trained on EHRs annotated for clinical named entities (cNEs) and negation markers by medical doctors.

Results: The proposed negation recognition pipeline obtained precision, recall, and F1-score of 0.93, 0.94, and 0.94 for the 'affirmative' class, and 0.86, 0.84, and 0.85 for the 'non-affirmative' class, respectively. To validate the generalization capabilities of our methodology, we applied the negation recognition pipeline on EHRs (6,710 cNEs) from a different data source distribution than the training corpus and obtained consistent performance metrics for the 'affirmative' and 'non-affirmative' class (0.95, 0.97, and 0.96; and 0.90, 0.83, and 0.86 for precision, recall, and F1-score, respectively). Lastly, we evaluated the pipeline against two publicly available Spanish negation corpora, the IULA and NUBes, obtaining state-of-the-art metrics (1.00, 0.99, and 0.99; and 1.00, 0.93, and 0.96 for precision, recall, and F1-score, respectively).

Conclusion: Negation recognition is a source of low precision in the retrieval of cNEs from EHRs' free-text. Combining a customized rule-based NegEx layer with a CNN binary classifier outperformed many other current approaches. RWE studies highly benefit from the correct recognition of negation as it reduces false positive detections of cNE which otherwise would undoubtedly reduce the credibility of cNLP systems.

Keywords: CNN; Clinical Natural Language Processing; Electronic health records; NegEx; Negation.

MeSH terms

Algorithms*
Electronic Health Records
Humans
Language
Natural Language Processing*
Neural Networks, Computer