Enhanced information retrieval from narrative German-language clinical text documents using automated document classification

Stud Health Technol Inform. 2008:136:473-8.

Abstract

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This paper describes the prototype of a medical information retrieval system (MIRS) for clinical text documents. The open-source information retrieval framework Apache Lucene has been used to implement the prototype of the MIRS. Additionally, a multi-label classification system based on the open-source data mining framework WEKA generates metadata from the clinical text document set. The metadata is used for influencing the rank order of documents retrieved by physicians. Combining information retrieval and automated document classification offers an enhanced approach to let physicians and in the near future patients define their information needs for information stored in an EPR. The system has been designed as a J2EE Web-application. First findings are based on a sample of 18,000 unstructured, clinical text documents written in German.

MeSH terms

  • Abstracting and Indexing*
  • Austria
  • Database Management Systems
  • Documentation / classification*
  • Hospital Information Systems
  • Humans
  • Information Storage and Retrieval*
  • Internet
  • Language*
  • Medical Records Systems, Computerized*
  • Narration*
  • Natural Language Processing*
  • Software
  • Unified Medical Language System
  • Vocabulary, Controlled