Data Engineering for Machine Learning in Women's Imaging and Beyond

AJR Am J Roentgenol. 2019 Jul;213(1):216-226. doi: 10.2214/AJR.18.20464. Epub 2019 Feb 19.

Abstract

OBJECTIVE. Data engineering is the foundation of effective machine learning model development and research. The accuracy and clinical utility of machine learning models fundamentally depend on the quality of the data used for model development. This article aims to provide radiologists and radiology researchers with an understanding of the core elements of data preparation for machine learning research. We cover key concepts from an engineering perspective, including databases, data integrity, and characteristics of data suitable for machine learning projects, and from a clinical perspective, including the HIPAA, patient consent, avoidance of bias, and ethical concerns related to the potential to magnify health disparities. The focus of this article is women's imaging; nonetheless, the principles described apply to all domains of medical imaging. CONCLUSION. Machine learning research is inherently interdisciplinary: effective collaboration is critical for success. In medical imaging, radiologists possess knowledge essential for data engineers to develop useful datasets for machine learning model development.

Keywords: artificial intelligence; breast imaging; data engineering; machine learning; women's imaging.