Knowledge discovery and data mining to assist natural language understanding

A Wilcox; G Hripcsak

Knowledge discovery and data mining to assist natural language understanding

Proc AMIA Symp. 1998:835-9.

Authors

A Wilcox¹, G Hripcsak

Affiliation

¹ Department of Medical Informatics, Columbia University, New York, NY, USA.

PMID: 9929336
PMCID: PMC2232072

Abstract

As natural language processing systems become more frequent in clinical use, methods for interpreting the output of these programs become increasingly important. These methods require the effort of a domain expert, who must build specific queries and rules for interpreting the processor output. Knowledge discovery and data mining tools can be used instead of a domain expert to automatically generate these queries and rules. C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system. A general-purpose natural language processor using this rule base was tested on a set of 200 chest radiograph reports. When a small set of reports, classified by physicians, was used as the training set, the generated rule base performed as well as lay persons, but worse than physicians. When a larger set of reports, using ICD9 coding to classify the set, was used for training the system, the rule base performed worse than the physicians and lay persons. It appears that a larger, more accurate training set is needed to increase performance of the method.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Artificial Intelligence*
Decision Trees
Heart Diseases / diagnostic imaging
Humans
Information Storage and Retrieval*
Lung Diseases / diagnostic imaging
Natural Language Processing*
Radiography, Thoracic*
Sensitivity and Specificity

Abstract

Publication types

MeSH terms

Grants and funding