Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora

J Biomed Inform. 2017 Jul:71:16-30. doi: 10.1016/j.jbi.2017.05.009. Epub 2017 May 16.

Abstract

Objective: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations.

Methods: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space.

Results: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces.

Conclusions: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

Keywords: Health records; Medical entity recognition; Supervised and unsupervised learning.

MeSH terms

  • Cluster Analysis
  • Data Curation
  • Electronic Health Records*
  • Humans
  • Machine Learning*
  • Semantics*
  • Sweden