Cohort selection for clinical trials using hierarchical neural network

J Am Med Inform Assoc. 2019 Nov 1;26(11):1203-1208. doi: 10.1093/jamia/ocz099.

Abstract

Objective: Cohort selection for clinical trials is a key step for clinical research. We proposed a hierarchical neural network to determine whether a patient satisfied selection criteria or not.

Materials and methods: We designed a hierarchical neural network (denoted as CNN-Highway-LSTM or LSTM-Highway-LSTM) for the track 1 of the national natural language processing (NLP) clinical challenge (n2c2) on cohort selection for clinical trials in 2018. The neural network is composed of 5 components: (1) sentence representation using convolutional neural network (CNN) or long short-term memory (LSTM) network; (2) a highway network to adjust information flow; (3) a self-attention neural network to reweight sentences; (4) document representation using LSTM, which takes sentence representations in chronological order as input; (5) a fully connected neural network to determine whether each criterion is met or not. We compared the proposed method with its variants, including the methods only using the first component to represent documents directly and the fully connected neural network for classification (denoted as CNN-only or LSTM-only) and the methods without using the highway network (denoted as CNN-LSTM or LSTM-LSTM). The performance of all methods was measured by micro-averaged precision, recall, and F1 score.

Results: The micro-averaged F1 scores of CNN-only, LSTM-only, CNN-LSTM, LSTM-LSTM, CNN-Highway-LSTM, and LSTM-Highway-LSTM were 85.24%, 84.25%, 87.27%, 88.68%, 88.48%, and 90.21%, respectively. The highest micro-averaged F1 score is higher than our submitted 1 of 88.55%, which is 1 of the top-ranked results in the challenge. The results indicate that the proposed method is effective for cohort selection for clinical trials.

Discussion: Although the proposed method achieved promising results, some mistakes were caused by word ambiguity, negation, number analysis and incomplete dictionary. Moreover, imbalanced data was another challenge that needs to be tackled in the future.

Conclusion: In this article, we proposed a hierarchical neural network for cohort selection. Experimental results show that this method is good at selecting cohort.

Keywords: classification; clinical trials; cohort selection; hierarchical neural network; mental health records.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Clinical Trials as Topic / methods*
  • Data Mining / methods*
  • Electronic Data Processing
  • Electronic Health Records
  • Humans
  • Machine Learning
  • Mental Health
  • Natural Language Processing
  • Neural Networks, Computer*
  • Patient Selection*