Identifying Diabetes in Clinical Notes in Hebrew: A Novel Text Classification Approach Based on Word Embedding

Stud Health Technol Inform. 2019 Aug 21:264:393-397. doi: 10.3233/SHTI190250.

Abstract

NimbleMiner is a word embedding-based, language-agnostic natural language processing system for clinical text classification. Previously, NimbleMiner was applied in English and this study applied NimbleMiner on a large sample of inpatient clinical notes in Hebrew to identify instances of diabetes mellitus. The study data included 521,278 clinical notes (one admission and one discharge note per patient) for 268,664 hospital admissions to medical-surgical units of a large hospital in Israel. NimbleMiner achieved overall good performance (F-score =.94) when tested on a gold standard human annotated dataset of 800 clinical notes. We found 15% more patients with diabetes mentioned in the clinical notes compared with diagnoses data. Our findings about underreporting of diabetes in the coded diagnoses data highlight the urgent need for tools and algorithms that will help busy providers identify a range of useful information, like having a diabetes.

Keywords: Diabetes; Natural language processing; Text classification.

MeSH terms

  • Algorithms
  • Diabetes Mellitus*
  • Electronic Health Records
  • Humans
  • Israel
  • Language
  • Natural Language Processing*