Comparison of named entity recognition methodologies in biomedical documents

Hye-Jeong Song; Byeong-Cheol Jo; Chan-Young Park; Jong-Dae Kim; Yu-Seop Kim

doi:10.1186/s12938-018-0573-6

Comparison of named entity recognition methodologies in biomedical documents

Biomed Eng Online. 2018 Nov 6;17(Suppl 2):158. doi: 10.1186/s12938-018-0573-6.

Authors

Hye-Jeong Song^{1

2}, Byeong-Cheol Jo^{1

2}, Chan-Young Park^{1

2}, Jong-Dae Kim^{1

2}, Yu-Seop Kim^{3

4}

Affiliations

¹ School of Software, Hallym University, Chuncheon, South Korea.
² Bio-IT Research Center, Hallym University, Chuncheon, South Korea.
³ School of Software, Hallym University, Chuncheon, South Korea. yskim01@hallym.ac.kr.
⁴ Bio-IT Research Center, Hallym University, Chuncheon, South Korea. yskim01@hallym.ac.kr.

Abstract

Background: Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

Results: Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

Conclusions: By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

Keywords: Biomedical named entity recognition (Bio NER); Conditional random fields (CRFs); Recurrent neural network (RNN); Word embedding.

Publication types

Comparative Study

MeSH terms

Biomedical Research*
Data Mining / methods*
Documentation*
Neural Networks, Computer