An analysis of entity normalization evaluation biases in specialized domains

BMC Bioinformatics. 2023 Jun 2;24(1):227. doi: 10.1186/s12859-023-05350-9.

Abstract

Background: Entity normalization is an important information extraction task which has recently gained attention, particularly in the clinical/biomedical and life science domains. On several datasets, state-of-the-art methods perform rather well on popular benchmarks. Yet, we argue that the task is far from resolved.

Results: We have selected two gold standard corpora and two state-of-the-art methods to highlight some evaluation biases. We present non-exhaustive initial findings on the existence of evaluation problems of the entity normalization task.

Conclusions: Our analysis suggests better evaluation practices to support the methodological research in this field.

Keywords: Ablation study; Corpus; Dataset; Entity normalization; Evaluation.

MeSH terms

  • Bias
  • Biological Science Disciplines*
  • Information Storage and Retrieval*
  • Natural Language Processing
  • Research Design