Cohort selection for clinical trials using hierarchical neural network

Ying Xiong; Xue Shi; Shuai Chen; Dehuan Jiang; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Jun Yan

doi:10.1093/jamia/ocz099

Cohort selection for clinical trials using hierarchical neural network

J Am Med Inform Assoc. 2019 Nov 1;26(11):1203-1208. doi: 10.1093/jamia/ocz099.

Authors

Ying Xiong¹, Xue Shi¹, Shuai Chen¹, Dehuan Jiang¹, Buzhou Tang¹, Xiaolong Wang¹, Qingcai Chen¹, Jun Yan²

Affiliations

¹ Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China.
² Yidu Cloud (Beijing) Technology Co., Ltd, Beijing, China.

Abstract

Objective: Cohort selection for clinical trials is a key step for clinical research. We proposed a hierarchical neural network to determine whether a patient satisfied selection criteria or not.

Materials and methods: We designed a hierarchical neural network (denoted as CNN-Highway-LSTM or LSTM-Highway-LSTM) for the track 1 of the national natural language processing (NLP) clinical challenge (n2c2) on cohort selection for clinical trials in 2018. The neural network is composed of 5 components: (1) sentence representation using convolutional neural network (CNN) or long short-term memory (LSTM) network; (2) a highway network to adjust information flow; (3) a self-attention neural network to reweight sentences; (4) document representation using LSTM, which takes sentence representations in chronological order as input; (5) a fully connected neural network to determine whether each criterion is met or not. We compared the proposed method with its variants, including the methods only using the first component to represent documents directly and the fully connected neural network for classification (denoted as CNN-only or LSTM-only) and the methods without using the highway network (denoted as CNN-LSTM or LSTM-LSTM). The performance of all methods was measured by micro-averaged precision, recall, and F1 score.

Results: The micro-averaged F1 scores of CNN-only, LSTM-only, CNN-LSTM, LSTM-LSTM, CNN-Highway-LSTM, and LSTM-Highway-LSTM were 85.24%, 84.25%, 87.27%, 88.68%, 88.48%, and 90.21%, respectively. The highest micro-averaged F1 score is higher than our submitted 1 of 88.55%, which is 1 of the top-ranked results in the challenge. The results indicate that the proposed method is effective for cohort selection for clinical trials.

Discussion: Although the proposed method achieved promising results, some mistakes were caused by word ambiguity, negation, number analysis and incomplete dictionary. Moreover, imbalanced data was another challenge that needs to be tackled in the future.

Conclusion: In this article, we proposed a hierarchical neural network for cohort selection. Experimental results show that this method is good at selecting cohort.

Keywords: classification; clinical trials; cohort selection; hierarchical neural network; mental health records.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Clinical Trials as Topic / methods*
Data Mining / methods*
Electronic Data Processing
Electronic Health Records
Humans
Machine Learning
Mental Health
Natural Language Processing
Neural Networks, Computer*
Patient Selection*

Abstract

Publication types

MeSH terms

Grants and funding