Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries

J Am Med Inform Assoc. 2015 Jan;22(1):132-42. doi: 10.1136/amiajnl-2014-002991. Epub 2014 Oct 20.

Abstract

Background and objective: Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents.

Methods: Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines.

Results: The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines.

Conclusions: The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.

Keywords: CDA entry level; auto-complete technique; natural language processing.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Automation*
  • Databases as Topic
  • Electronic Health Records*
  • Health Level Seven
  • Humans
  • Natural Language Processing*
  • User-Computer Interface