Final Report on the German Clinical Reference Corpus 3000PA

Stud Health Technol Inform. 2024 Jan 25:310:599-603. doi: 10.3233/SHTI231035.

Abstract

We here report on one of the outcomes of a large-scale German research program, the Medical Informatics Initiative (MII), aiming at the development of a solid data and software infrastructure for German-language clinical natural language processing. Within this framework, we have developed 3000PA, a national clinical reference corpus composed of patient records from three clinical university sites and annotated with a multitude of semantic annotation layers (including medical named entities, semantic and temporal relations between entities, as well as certainty and negation information related to entities and relations). This non-sharable corpus has been complemented by three sharable ones (JSYNCC, GGPONC, and GRASCCO). Overall, 3000PA, JSYNCC and GRASCCO feature about 2.1 million metadata points.

Keywords: Clinical text corpus; German language; annotation; clinical NLP.

MeSH terms

  • Humans
  • Language*
  • Medical Informatics*
  • Metadata
  • Natural Language Processing
  • Semantics