Accurately Identifying Cerebroarterial Stenosis from Angiography Reports Using Natural Language Processing Approaches

Ching-Heng Lin; Kai-Cheng Hsu; Chih-Kuang Liang; Tsong-Hai Lee; Ching-Sen Shih; Yang C Fann

doi:10.3390/diagnostics12081882

Accurately Identifying Cerebroarterial Stenosis from Angiography Reports Using Natural Language Processing Approaches

Diagnostics (Basel). 2022 Aug 3;12(8):1882. doi: 10.3390/diagnostics12081882.

Authors

Ching-Heng Lin^{1

2

3}, Kai-Cheng Hsu^{3

4

5

6}, Chih-Kuang Liang^{3

7

8

9}, Tsong-Hai Lee^{10

11}, Ching-Sen Shih⁸, Yang C Fann³

Affiliations

¹ Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan.
² Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan 33305, Taiwan.
³ Bioinformatics Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
⁴ Department of Medicine, China Medical University, Taichung 40447, Taiwan.
⁵ Artificial Intelligence Center for Medical Diagnosis, China Medical University Hospital, Taichung 40402, Taiwan.
⁶ Department of Neurology, China Medical University Hospital, Taichung 40402, Taiwan.
⁷ Center for Geriatrics and Gerontology, Kaohsiung Veterans General Hospital, Kaohsiung 81362, Taiwan.
⁸ Division of Neurology, Department of Medicine, Kaohsiung Veterans General Hospital, Kaohsiung 81362, Taiwan.
⁹ Aging and Health Research Center, National Yang Ming University, Taipei 11221, Taiwan.
¹⁰ Stroke Center and Department of Neurology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan 33333, Taiwan.
¹¹ College of Medicine, Chang Gung University, Taoyuan 33302, Taiwan.

Abstract

Patients with intracranial artery stenosis show high incidence of stroke. Angiography reports contain rich but underutilized information that can enable the detection of cerebrovascular diseases. This study evaluated various natural language processing (NLP) techniques to accurately identify eleven intracranial artery stenosis from angiography reports. Three NLP models, including a rule-based model, a recurrent neural network (RNN), and a contextualized language model, XLNet, were developed and evaluated by internal-external cross-validation. In this study, angiography reports from two independent medical centers (9614 for training and internal validation testing and 315 as external validation) were assessed. The internal testing results showed that XLNet had the best performance, with a receiver operating characteristic curve (AUROC) ranging from 0.97 to 0.99 using eleven targeted arteries. The rule-based model attained an AUROC from 0.92 to 0.96, and the RNN long short-term memory model attained an AUROC from 0.95 to 0.97. The study showed the potential application of NLP techniques such as the XLNet model for the routine and automatic screening of patients with high risk of intracranial artery stenosis using angiography reports. However, the NLP models were investigated based on relatively small sample sizes with very different report writing styles and a prevalence of stenosis case distributions, revealing challenges for model generalization.

Keywords: cerebrovascular diseases; deep learning; intracranial artery stenosis; natural language processing; ruled-based model.

Abstract

Grants and funding