Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research

J Am Med Inform Assoc. 2020 Jan 1;27(1):109-118. doi: 10.1093/jamia/ocz165.

Abstract

Objective: Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources.

Materials and methods: Leaf utilizes a flexible biomedical concept system to define hierarchical concepts and ontologies. Each Leaf concept contains both textual representations and SQL query building blocks, exposed by a simple drag-and-drop user interface. Leaf generates abstract syntax trees which are compiled into dynamic SQL queries.

Results: Leaf is a successful production-supported tool at the University of Washington, which hosts a central Leaf instance querying an enterprise data warehouse with over 300 active users. Through the support of UW Medicine (https://uwmedicine.org), the Institute of Translational Health Sciences (https://www.iths.org), and the National Center for Data to Health (https://ctsa.ncats.nih.gov/cd2h/), Leaf source code has been released into the public domain at https://github.com/uwrit/leaf.

Discussion: Leaf allows the querying of single or multiple clinical databases simultaneously, even those of different data models. This enables fast installation without costly extraction or duplication.

Conclusions: Leaf differs from existing cohort discovery tools because it does not specify a required data model and is designed to seamlessly leverage existing user authentication systems and clinical databases in situ. We believe Leaf to be useful for health system analytics, clinical research data warehouses, precision medicine biobanks, and clinical studies involving large patient cohorts.

Keywords: biomedical informatics; cloud computing; cohort discovery; data integration; leaf; observational health data sciences and informatics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Warehousing*
  • Databases as Topic
  • Humans
  • Information Storage and Retrieval / methods*
  • Internet
  • Translational Research, Biomedical*
  • Unified Medical Language System
  • User-Computer Interface*
  • Vocabulary, Controlled*