Enhanced information retrieval from narrative German-language clinical text documents using automated document classification

Stephan Spat; Bruno Cadonna; Ivo Rakovac; Christian Gütl; Hubert Leitner; Günther Stark; Peter Beck

Enhanced information retrieval from narrative German-language clinical text documents using automated document classification

Stud Health Technol Inform. 2008:136:473-8.

Authors

Stephan Spat¹, Bruno Cadonna, Ivo Rakovac, Christian Gütl, Hubert Leitner, Günther Stark, Peter Beck

Affiliation

¹ Institute of Medical Technologies and Health Management, Joanneum Research Forschungsgesellschaft mbH, Graz, Austria. stephan.spat@joanneum.at

PMID: 18487776

Abstract

The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This paper describes the prototype of a medical information retrieval system (MIRS) for clinical text documents. The open-source information retrieval framework Apache Lucene has been used to implement the prototype of the MIRS. Additionally, a multi-label classification system based on the open-source data mining framework WEKA generates metadata from the clinical text document set. The metadata is used for influencing the rank order of documents retrieved by physicians. Combining information retrieval and automated document classification offers an enhanced approach to let physicians and in the near future patients define their information needs for information stored in an EPR. The system has been designed as a J2EE Web-application. First findings are based on a sample of 18,000 unstructured, clinical text documents written in German.

MeSH terms

Abstracting and Indexing*
Austria
Database Management Systems
Documentation / classification*
Hospital Information Systems
Humans
Information Storage and Retrieval*
Internet
Language*
Medical Records Systems, Computerized*
Narration*
Natural Language Processing*
Software
Unified Medical Language System
Vocabulary, Controlled