Rich features based Conditional Random Fields for biological named entities recognition

Chengjie Sun; Yi Guan; Xiaolong Wang; Lei Lin

doi:10.1016/j.compbiomed.2006.12.002

Rich features based Conditional Random Fields for biological named entities recognition

Comput Biol Med. 2007 Sep;37(9):1327-33. doi: 10.1016/j.compbiomed.2006.12.002. Epub 2007 Jan 19.

Authors

Chengjie Sun¹, Yi Guan, Xiaolong Wang, Lei Lin

Affiliation

¹ School of Computer Science, Harbin Institute of Technology, Mailbox 319, West Da-zhi Street 92, Harbin, Heilongjiang 150001, China. cjsun@insun.hit.edu.cn

PMID: 17239841
DOI: 10.1016/j.compbiomed.2006.12.002

Abstract

Biological named entity recognition is a critical task for automatically mining knowledge from biological literature. In this paper, this task is cast as a sequential labeling problem and Conditional Random Fields model is introduced to solve it. Under the framework of Conditional Random Fields model, rich features including literal, context and semantics are involved. Among these features, shallow syntactic features are first introduced, which effectively improve the model's performance. Experiments show that our method can achieve an F-measure of 71.2% in an open evaluation data, which is better than most of state-of-the-art systems.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence
Biomedical Research / methods*
Information Storage and Retrieval / methods*
Information Systems*
MEDLINE
Models, Statistical*
Pattern Recognition, Automated / methods*
Terminology as Topic