On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions

Maite Oronoz; Koldo Gojenola; Alicia Pérez; Arantza Díaz de Ilarraza; Arantza Casillas

doi:10.1016/j.jbi.2015.06.016

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions

J Biomed Inform. 2015 Aug:56:318-32. doi: 10.1016/j.jbi.2015.06.016. Epub 2015 Jun 30.

Authors

Maite Oronoz¹, Koldo Gojenola¹, Alicia Pérez¹, Arantza Díaz de Ilarraza¹, Arantza Casillas²

Affiliations

¹ IXA Group, University of the Basque Country (UPV-EHU), Computer Engineering Faculty, P. Manuel Lardizabal, 1, 20018 Donostia-San Sebastián, Spain(1).
² IXA Group, University of the Basque Country (UPV-EHU), Computer Engineering Faculty, P. Manuel Lardizabal, 1, 20018 Donostia-San Sebastián, Spain(1). Electronic address: arantza.casillas@ehu.eus.

PMID: 26141794
DOI: 10.1016/j.jbi.2015.06.016

Abstract

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.

Keywords: Adverse drug reaction; Clinical text; Gold standard; Text mining.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adverse Drug Reaction Reporting Systems*
Algorithms
Automation
Data Mining / methods*
Drug-Related Side Effects and Adverse Reactions*
Electronic Health Records / standards*
Language
Linguistics
Machine Learning
Natural Language Processing*
Pharmaceutical Preparations
Pharmacovigilance
Predictive Value of Tests
Reproducibility of Results
Translating

Substances

Pharmaceutical Preparations