Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study

Li Li; Herbert S Chase; Chintan O Patel; Carol Friedman; Chunhua Weng

Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study

AMIA Annu Symp Proc. 2008 Nov 6:2008:404-8.

Authors

Li Li¹, Herbert S Chase, Chintan O Patel, Carol Friedman, Chunhua Weng

Affiliation

¹ Department of Biomedical Informatics, Columbia University, New York, NY, USA.

PMID: 18999285
PMCID: PMC2656007

Abstract

The prevalence of electronic medical record (EMR) systems has made mass-screening for clinical trials viable through secondary uses of clinical data, which often exist in both structured and free text formats. The tradeoffs of using information in either data format for clinical trials screening are understudied. This paper compares the results of clinical trial eligibility queries over ICD9-encoded diagnoses and NLP-processed textual discharge summaries. The strengths and weaknesses of both data sources are summarized along the following dimensions: information completeness, expressiveness, code granularity, and accuracy of temporal information. We conclude that NLP-processed patient reports supplement important information for eligibility screening and should be used in combination with structured data.

Publication types

Comparative Study
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Artificial Intelligence
Clinical Trials as Topic / methods*
Diagnosis*
Medical Records Systems, Computerized*
Natural Language Processing*
New York
Patient Discharge*
Patient Selection*
Pattern Recognition, Automated / methods*

Abstract

Publication types

MeSH terms

Grants and funding