Integrating Language Model and Reading Control Gate in BLSTM-CRF for Biomedical Named Entity Recognition

IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):841-846. doi: 10.1109/TCBB.2018.2868346. Epub 2018 Sep 3.

Abstract

Biomedical named entity recognition (Bio-NER) is an important preliminary step for many biomedical text mining tasks. The current mainstream methods for NER are based on the neural networks to avoid the complex hand-designed features derived from various linguistic analyses. However, these methods ignore some potential sentence-level semantic information and general features of semantic and syntactic. Therefore, we propose a novel Long Short Term Memory (LSTM) Networks model integrating language model and sentence-level reading control gate (LS-BLSTM-CRF) for Bio-NER. In our model, a sentence-level reading control gate (SC) is inserted into the networks to integrate the implicit meaning of an entire sentence and the language model is integrated to our model to learn richer potential features. Besides, character-level embeddings are introduced as the input to deal with out-of-vocabulary words. The experimental results conducted on the BioCreative II GM corpus show that our method can achieve an F-score of 89.94 percent, which outperforms all state-of-the-art systems and is 1.33 percent higher than the best performing neural networks.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Data Mining / methods*
  • Deep Learning
  • Natural Language Processing
  • Neural Networks, Computer*
  • Pattern Recognition, Automated / methods*
  • Semantics