USI: a fast and accurate approach for conceptual document annotation

BMC Bioinformatics. 2015 Mar 14:16:83. doi: 10.1186/s12859-015-0513-4.

Abstract

Background: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document.

Results: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity.

Conclusions: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Abstracting and Indexing*
  • Algorithms*
  • Humans
  • Information Storage and Retrieval*
  • Medical Subject Headings
  • Natural Language Processing*
  • Pattern Recognition, Automated
  • Semantics*
  • User-Computer Interface*
  • Vocabulary, Controlled