Inferred joint multigram models for medical term normalization according to ICD

Int J Med Inform. 2018 Feb:110:111-117. doi: 10.1016/j.ijmedinf.2017.12.007. Epub 2017 Dec 14.

Abstract

Background: Electronic Health Records (EHRs) are written using spontaneous natural language. Often, terms do not match standard terminology like the one available through the International Classification of Diseases (ICD).

Objective: Information retrieval and exchange can be improved using standard terminology. Our aim is to render diagnostic terms written in spontaneous language in EHRs into the standard framework provided by the ICD.

Methods: We tackle diagnostic term normalization employing Weighted Finite-State Transducers (WFSTs). These machines learn how to translate sequences, in the case of our concern, spontaneous representations into standard representations given a set of samples. They are highly flexible and easily adaptable to terminological singularities of each different hospital and practitioner. Besides, we implemented a similarity metric to enhance spontaneous-standard term matching.

Results: From the 2850 spontaneous DTs randomly selected we found that only 7.71% were written in their standard form matching the ICD. This WFST-based system enabled matching spontaneous ICDs with a Mean Reciprocal Rank of 0.68, which means that, on average, the right ICD code is found between the first and second position among the normalized set of candidates. This guarantees efficient document exchange and, furthermore, information retrieval.

Conclusion: Medical term normalization was achieved with high performance. We found that direct matching of spontaneous terms using standard lexicons leads to unsatisfactory results while normalized hypothesis generation by means of WFST helped to overcome the gap between spontaneous and standard language.

Keywords: Electronic Health Records; Finite State Models; International Classification of Diseases; Normalization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Health Records / standards*
  • Humans
  • Information Storage and Retrieval / standards*
  • International Classification of Diseases / standards*
  • Medical Informatics Applications
  • Natural Language Processing
  • Terminology as Topic*