Augmentation of Electronic Medical Record Data for Deep Learning

Stud Health Technol Inform. 2022 Jun 6:290:582-586. doi: 10.3233/SHTI220144.

Abstract

Data imbalance is a well-known challenge in the development of machine learning models. This is particularly relevant when the minority class is the class of interest, which is frequently the case in models that predict mortality, specific diagnoses or other important clinical end-points. Typical methods of dealing with this include over- or under-sampling training data, or weighting the loss function in order to boost the signal from the minority class. Data augmentation is another frequently employed method - particularly for models that use images as input data. For discrete time-series data, however, there is no consensus method of data augmentation. We propose a simple data augmentation strategy that can be applied to discrete time-series data from the EMR. This strategy is then demonstrated using a publicly available data-set, in order to provide proof of concept for the work undertaken in [1], where data is unable to be made open.

Keywords: Deep Learning; Electronic Health Records.

MeSH terms

  • Deep Learning*
  • Electronic Health Records*
  • Machine Learning