Using case-level context to classify cancer pathology reports

Shang Gao; Mohammed Alawad; Noah Schaefferkoetter; Lynne Penberthy; Xiao-Cheng Wu; Eric B Durbin; Linda Coyle; Arvind Ramanathan; Georgia Tourassi

doi:10.1371/journal.pone.0232840

Using case-level context to classify cancer pathology reports

PLoS One. 2020 May 12;15(5):e0232840. doi: 10.1371/journal.pone.0232840. eCollection 2020.

Authors

Shang Gao¹, Mohammed Alawad¹, Noah Schaefferkoetter¹, Lynne Penberthy², Xiao-Cheng Wu³, Eric B Durbin⁴, Linda Coyle⁵, Arvind Ramanathan⁶, Georgia Tourassi¹

Affiliations

¹ Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.
² Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, United States of America.
³ Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, LA, United States of America.
⁴ Kentucky Cancer Registry, University of Kentucky, Lexington, KY, United States of America.
⁵ Information Management Services Inc, Calverton, MD, United States of America.
⁶ Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America.

Abstract

Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Electronic Health Records / classification*
Histological Techniques
Humans
Natural Language Processing
Neoplasms / pathology*
SEER Program

Grants and funding

P30 CA177558/CA/NCI NIH HHS/United States