Automatic lexicon acquisition for a medical cross-language information retrieval system

Stud Health Technol Inform. 2005:116:829-34.

Abstract

We present a method for the automated acquisition of a multilingual medical lexicon (for Spanish and Swedish) to be used within the framework of a medical cross-language text retrieval system. We incorporate seed lexicons and parallel corpora derived from the UMLS Metathesaurus. The seed lexicons for Spanish and Swedish are automatically generated from (previously manually constructed) Portuguese, German and English sources. Lexical and semantic hypotheses are then validated making iterative use of co-occurrence patterns of hypothesized translation synonyms in the parallel corpora.

MeSH terms

  • Humans
  • Information Storage and Retrieval
  • Language*
  • Multilingualism
  • Natural Language Processing*
  • Semantics
  • Unified Medical Language System