Knowledge discovery and data mining to assist natural language understanding

Proc AMIA Symp. 1998:835-9.

Abstract

As natural language processing systems become more frequent in clinical use, methods for interpreting the output of these programs become increasingly important. These methods require the effort of a domain expert, who must build specific queries and rules for interpreting the processor output. Knowledge discovery and data mining tools can be used instead of a domain expert to automatically generate these queries and rules. C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system. A general-purpose natural language processor using this rule base was tested on a set of 200 chest radiograph reports. When a small set of reports, classified by physicians, was used as the training set, the generated rule base performed as well as lay persons, but worse than physicians. When a larger set of reports, using ICD9 coding to classify the set, was used for training the system, the rule base performed worse than the physicians and lay persons. It appears that a larger, more accurate training set is needed to increase performance of the method.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Artificial Intelligence*
  • Decision Trees
  • Heart Diseases / diagnostic imaging
  • Humans
  • Information Storage and Retrieval*
  • Lung Diseases / diagnostic imaging
  • Natural Language Processing*
  • Radiography, Thoracic*
  • Sensitivity and Specificity