An environment for relation mining over richly annotated corpora: the case of GENIA

BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-7-S3-S3.

Abstract

Background: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information.

Results: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information.

Conclusion: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.

Publication types

  • Evaluation Study

MeSH terms

  • Abstracting and Indexing*
  • Algorithms
  • Artificial Intelligence*
  • Databases, Factual
  • Information Storage and Retrieval / methods*
  • Natural Language Processing*
  • Periodicals as Topic*
  • Semantics
  • Software
  • Terminology as Topic*
  • Vocabulary, Controlled*