Machine Learning Approaches on Diagnostic Term Encoding With the ICD for Clinical Documentation

IEEE J Biomed Health Inform. 2018 Jul;22(4):1323-1329. doi: 10.1109/JBHI.2017.2743824. Epub 2017 Aug 24.

Abstract

This work focuses on data mining applied to the clinical documentation domain. Diagnostic terms (DTs) are used as keywords to retrieve valuable information from electronic health records. Indeed, they are encoded manually by experts following the International Classification of Diseases (ICD). The goal of this work is to explore the aid of text mining on DT encoding. From the machine learning (ML) perspective, this is a high-dimensional classification task, as it comprises thousands of codes. This work delves into a robust representation of the instances to improve ML results. The proposed system is able to find the right ICD code among more than 1500 possible ICD codes with 92% precision for the main disease (primary class) and 88% for the main disease together with the nonessential modifiers (fully specified class). The methodology employed is simple and portable. According to the experts from public hospitals, the system is very useful in particular for documentation and pharmacosurveillance services. In fact, they reported an accuracy of 91.2% on a small randomly extracted test. Hence, together with this paper, we made the software publicly available in order to help the clinical and research community.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods
  • Documentation / methods*
  • Electronic Health Records*
  • Humans
  • International Classification of Diseases*
  • Machine Learning*
  • Natural Language Processing