Fast and Efficient Feature Engineering for Multi-Cohort Analysis of EHR Data

Stud Health Technol Inform. 2017:235:181-185.

Abstract

We present a framework for feature engineering, tailored for longitudinal structured data, such as electronic health records (EHRs). To fast-track feature engineering and extraction, the framework combines general-use plug-in extractors, a multi-cohort management mechanism, and modular memoization. Using this framework, we rapidly extracted thousands of features from diverse and large healthcare data sources in multiple projects.

Keywords: Feature engineering; electronic health records; longitudinal data.

MeSH terms

  • Cohort Studies
  • Delivery of Health Care / statistics & numerical data
  • Electronic Health Records / organization & administration*
  • Humans
  • Informatics / methods*
  • Machine Learning
  • Risk Factors