Comparison of named entity recognition methodologies in biomedical documents

Biomed Eng Online. 2018 Nov 6;17(Suppl 2):158. doi: 10.1186/s12938-018-0573-6.

Abstract

Background: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

Results: Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

Conclusions: By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

Keywords: Biomedical named entity recognition (Bio NER); Conditional random fields (CRFs); Recurrent neural network (RNN); Word embedding.

Publication types

  • Comparative Study

MeSH terms

  • Biomedical Research*
  • Data Mining / methods*
  • Documentation*
  • Neural Networks, Computer