3000PA-Towards a National Reference Corpus of German Clinical Language

Stud Health Technol Inform. 2018:247:26-30.

Abstract

We introduce 3000PA, a clinical document corpus composed of 3,000 EPRs from three different clinical sites, which will serve as the backbone of a national reference language resource for German clinical NLP. We outline its design principles, results from a medication annotation campaign and the evaluation of a first medication information extraction prototype using a subset of 3000PA.

Keywords: German language; clinical text corpus; medication information extraction.

MeSH terms

  • Humans
  • Information Storage and Retrieval*
  • Language
  • Natural Language Processing*