Semantic transference for enriching multilingual biomedical knowledge resources

J Biomed Inform. 2015 Dec:58:1-10. doi: 10.1016/j.jbi.2015.08.026. Epub 2015 Sep 18.

Abstract

Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non-English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to generate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transference quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Spanish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842).

Keywords: Multilingual biomedical knowledge resources; Semantic transference; Term alignment.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • Knowledge
  • Semantics*
  • Vocabulary, Controlled*