Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning

Jackson M Steinkamp; Charles Chambers; Darco Lalevic; Hanna M Zafar; Tessa S Cook

doi:10.1007/s10278-019-00234-y

Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning

J Digit Imaging. 2019 Aug;32(4):554-564. doi: 10.1007/s10278-019-00234-y.

Authors

Jackson M Steinkamp^{1

2}, Charles Chambers³, Darco Lalevic³, Hanna M Zafar³, Tessa S Cook³

Affiliations

¹ Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, PA, 19104, USA. jacksonsteinkamp@gmail.com.
² Boston University School of Medicine, Boston, MA, 02119, USA. jacksonsteinkamp@gmail.com.
³ Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, PA, 19104, USA.

Abstract

Unstructured and semi-structured radiology reports represent an underutilized trove of information for machine learning (ML)-based clinical informatics applications, including abnormality tracking systems, research cohort identification, point-of-care summarization, semi-automated report writing, and as a source of weak data labels for training image processing systems. Clinical ML systems must be interpretable to ensure user trust. To create interpretable models applicable to all of these tasks, we can build general-purpose systems which extract all relevant human-level assertions or "facts" documented in reports; identifying these facts is an information extraction (IE) task. Previous IE work in radiology has focused on a limited set of information, and extracts isolated entities (i.e., single words such as "lesion" or "cyst") rather than complete facts, which require the linking of multiple entities and modifiers. Here, we develop a prototype system to extract all useful information in abdominopelvic radiology reports (findings, recommendations, clinical history, procedures, imaging indications and limitations, etc.), in the form of complete, contextualized facts. We construct an information schema to capture the bulk of information in reports, develop real-time ML models to extract this information, and demonstrate the feasibility and performance of the system.

Keywords: Machine learning; Natural language processing; Radiology reports; Structured reporting.

MeSH terms

Data Mining
Electronic Health Records*
Humans
Machine Learning*
Natural Language Processing
Radiology Information Systems*