Clinical application of the integrated multicenter discharge summary database

Stud Health Technol Inform. 2015:216:1120.

Abstract

We performed the multi-year project to collect discharge summary from multiple hospitals and made the big text database to build a common document vector space, and developed various applications. We extracted 243,907 discharge summaries from seven hospitals. There was a difference in term structure and number of terms between the hospitals, however the differences by disease were similar. We built the vector space using TF-IDF method. We performed a cross-match analysis of DPC selection among seven hospitals. About 80% cases were correctly matched. The use of model data of other hospitals reduced selection rate to around 10%; however, integrated model data from all hospitals restored the selection rate.

MeSH terms

  • Data Accuracy
  • Data Mining / methods*
  • Database Management Systems
  • Databases, Factual*
  • Electronic Health Records / organization & administration*
  • Japan
  • Medical Record Linkage / methods*
  • Multicenter Studies as Topic
  • Natural Language Processing
  • Patient Discharge Summaries / statistics & numerical data*
  • Systems Integration
  • Vocabulary, Controlled*