PKDE4J: Entity and relation extraction for public knowledge discovery

Min Song; Won Chul Kim; Dahee Lee; Go Eun Heo; Keun Young Kang

doi:10.1016/j.jbi.2015.08.008

PKDE4J: Entity and relation extraction for public knowledge discovery

J Biomed Inform. 2015 Oct:57:320-32. doi: 10.1016/j.jbi.2015.08.008. Epub 2015 Aug 12.

Authors

Min Song¹, Won Chul Kim², Dahee Lee³, Go Eun Heo⁴, Keun Young Kang⁵

Affiliations

¹ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea. Electronic address: min.song@yonsei.ac.kr.
² Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea. Electronic address: krevas@yonsei.ac.kr.
³ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea. Electronic address: leedahee@yonsei.ac.kr.
⁴ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea. Electronic address: goeun.heo@yonsei.ac.kr.
⁵ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea. Electronic address: ky.kang@yonsei.ac.kr.

PMID: 26277115
DOI: 10.1016/j.jbi.2015.08.008

Abstract

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.

Keywords: Information extraction; Named entity recognition; Relation extraction; Text mining.

MeSH terms

Data Mining*
Knowledge
Knowledge Discovery*
Periodicals as Topic
Publications