Closing the loop: automatically identifying abnormal imaging results in scanned documents

Akshat Kumar; Heath Goodrum; Ashley Kim; Carly Stender; Kirk Roberts; Elmer V Bernstam

doi:10.1093/jamia/ocac007

Closing the loop: automatically identifying abnormal imaging results in scanned documents

J Am Med Inform Assoc. 2022 Apr 13;29(5):831-840. doi: 10.1093/jamia/ocac007.

Authors

Akshat Kumar^{1

2}, Heath Goodrum¹, Ashley Kim², Carly Stender², Kirk Roberts¹, Elmer V Bernstam^{1

3}

Affiliations

¹ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
² McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
³ Division of General Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.

Abstract

Objectives: Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision.

Materials and methods: We focused on identifying imaging findings for 3 common causes of malpractice claims: (1) potentially malignant breast (mammography) and (2) lung (chest computed tomography [CT]) lesions and (3) long-bone fracture (X-ray) reports. We train our ClinicalBERT-based pipeline on existing typed/dictated reports classified manually or using ICD-10 codes, evaluate using a test set of manually classified SDs, and compare against string-matching (baseline approach).

Results: A total of 393 mammograms, 305 chest CT, and 683 bone X-ray reports were manually reviewed. The string-matching approach had an F1 of 0.667. For mammograms, chest CTs, and bone X-rays, respectively: models trained on manually classified training data and optimized for F1 reached an F1 of 0.900, 0.905, and 0.817, while separate models optimized for recall achieved a recall of 1.000 with precisions of 0.727, 0.518, and 0.275. Models trained on ICD-10-labelled data and optimized for F1 achieved F1 scores of 0.647, 0.830, and 0.643, while those optimized for recall achieved a recall of 1.0 with precisions of 0.407, 0.683, and 0.358.

Discussion: Our pipeline can identify abnormal reports with potentially useful performance and so decrease the manual effort required to screen for abnormal findings that require follow-up.

Conclusion: It is possible to automatically identify clinically significant abnormalities in SDs with high recall and practically useful precision in a generalizable and minimally laborious way.

Keywords: classification; electronic health records; machine learning; natural language processing; radiology.

Closing the loop: automatically identifying abnormal imaging results in scanned documents

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding