A terminological and ontological analysis of the NCI Thesaurus

Methods Inf Med. 2005;44(4):498-507.

Abstract

Objective: The National Cancer Institute Thesaurus is described by its authors as "a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research" and which "exhibits ontology-like properties in its construction and use". We performed a qualitative analysis of the Thesaurus in order to assess its conformity with principles of good practice in terminology and ontology design.

Materials and methods: We used both the on-line browsable version of the Thesaurus and its OWL-representation (version 04.08b, released on August 2, 2004), measuring each in light of the requirements put forward in relevant ISO terminology standards and in light of ontological principles advanced in the recent literature.

Results: We found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions.

Conclusion: Version 04.08b of the NCI Thesaurus suffers from the same broad range of problems that have been observed in other biomedical terminologies. For its further development, we recommend the use of a more principled approach that allows the Thesaurus to be tested not just for internal consistency but also for its degree of correspondence to that part of reality which it is designed to represent.

MeSH terms

  • Computational Biology / standards*
  • Databases, Factual*
  • Dictionaries as Topic
  • Humans
  • Information Storage and Retrieval
  • Medical Informatics Computing
  • National Institutes of Health (U.S.)*
  • Natural Language Processing*
  • Neoplasms / classification*
  • Systematized Nomenclature of Medicine
  • Terminology as Topic*
  • Unified Medical Language System
  • United States
  • Vocabulary, Controlled*