Optimizing healthcare research data warehouse design through past COSTAR query analysis

Proc AMIA Symp. 1999:892-6.

Abstract

Over the past two years we have reviewed and implemented the specifications for a large relational database (a data warehouse) to find research cohorts from data similar to that contained within the clinical COSTAR database at the Massachusetts General Hospital. A review of 16 years of COSTAR research queries was conducted to determine the most common search strategies. These search strategies are relevant to the general research community, because they use the Medical Query Language (MQL) developed for the COSTAR M database which is extremely flexible (much more so than SQL) and allows searches by coded fields, text reports, and laboratory values in a completely ad hoc fashion. By reviewing these search strategies, we were able to obtain user specifications for a research oriented healthcare data warehouse that could support 90% of the queries. The data warehouse was implemented in a relational database using the star schema, allowing for highly optimized analytical processing. This allowed queries that performed slowly in the M database to be performed very rapidly in the relational database. It also allowed the data warehouse to scale effectively.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Databases as Topic*
  • Health Services Research*
  • Hospital Information Systems
  • Humans
  • Information Storage and Retrieval / methods*
  • Programming Languages