Augmentation of Electronic Medical Record Data for Deep Learning

Georgina Kennedy; Mark Dras; Blanca Gallego

doi:10.3233/SHTI220144

Augmentation of Electronic Medical Record Data for Deep Learning

Stud Health Technol Inform. 2022 Jun 6:290:582-586. doi: 10.3233/SHTI220144.

Authors

Georgina Kennedy¹, Mark Dras², Blanca Gallego¹

Affiliations

¹ Centre for Big Data Research in Health, University of New South Wales, Sydney, NSW, Australia.
² Department of Computing, Macquarie University, Sydney, NSW, Australia.

PMID: 35673083
DOI: 10.3233/SHTI220144

Abstract

Data imbalance is a well-known challenge in the development of machine learning models. This is particularly relevant when the minority class is the class of interest, which is frequently the case in models that predict mortality, specific diagnoses or other important clinical end-points. Typical methods of dealing with this include over- or under-sampling training data, or weighting the loss function in order to boost the signal from the minority class. Data augmentation is another frequently employed method - particularly for models that use images as input data. For discrete time-series data, however, there is no consensus method of data augmentation. We propose a simple data augmentation strategy that can be applied to discrete time-series data from the EMR. This strategy is then demonstrated using a publicly available data-set, in order to provide proof of concept for the work undertaken in [1], where data is unable to be made open.

Keywords: Deep Learning; Electronic Health Records.

MeSH terms

Deep Learning*
Electronic Health Records*
Machine Learning