Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

Jakir Hossain Bhuiyan Masud; Chiang Shun; Chen-Cheng Kuo; Md Mohaimenul Islam; Chih-Yang Yeh; Hsuan-Chia Yang; Ming-Chin Lin

doi:10.3390/jpm12050707

Deep-ADCA: Development and Validation of Deep Learning Model for Automated Diagnosis Code Assignment Using Clinical Notes in Electronic Medical Records

J Pers Med. 2022 Apr 28;12(5):707. doi: 10.3390/jpm12050707.

Authors

Jakir Hossain Bhuiyan Masud¹, Chiang Shun^{1

2}, Chen-Cheng Kuo¹, Md Mohaimenul Islam^{3

4

5}, Chih-Yang Yeh¹, Hsuan-Chia Yang^{1

3

6}, Ming-Chin Lin^{1

7

8}

Affiliations

¹ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, Taiwan.
² Department of Otolaryngology, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan.
³ International Center for Health Information Technology (ICHIT), College of Medical Science and Technology, Taipei Medical University, Taipei 11031, Taiwan.
⁴ Research Center of Big Data and Meta-Analysis, Wan Fang Hospital, Taipei Medical University, Taipei 11696, Taiwan.
⁵ AESOP Technology, Taipei 10596, Taiwan.
⁶ Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei 11031, Taiwan.
⁷ Department of Neurosurgery, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan.
⁸ Taipei Neuroscience Institute, Taipei Medical University, Taipei 11031, Taiwan.

Abstract

Currently, the International Classification of Diseases (ICD) codes are being used to improve clinical, financial, and administrative performance. Inaccurate ICD coding can lower the quality of care, and delay or prevent reimbursement. However, selecting the appropriate ICD code from a patient's clinical history is time-consuming and requires expert knowledge. The rapid spread of electronic medical records (EMRs) has generated a large amount of clinical data and provides an opportunity to predict ICD codes using deep learning models. The main objective of this study was to use a deep learning-based natural language processing (NLP) model to accurately predict ICD-10 codes, which could help providers to make better clinical decisions and improve their level of service. We retrospectively collected clinical notes from five outpatient departments (OPD) from one university teaching hospital between January 2016 and December 2016. We applied NLP techniques, including global vectors, word to vectors, and embedding techniques to process the data. The dataset was split into two independent training and testing datasets consisting of 90% and 10% of the entire dataset, respectively. A convolutional neural network (CNN) model was developed, and the performance was measured using the precision, recall, and F-score. A total of 21,953 medical records were collected from 5016 patients. The performance of the CNN model for the five different departments was clinically satisfactory (Precision: 0.50~0.69 and recall: 0.78~0.91). However, the CNN model achieved the best performance for the cardiology department, with a precision of 69%, a recall of 89% and an F-score of 78%. The CNN model for predicting ICD-10 codes provides an opportunity to improve the quality of care. Implementing this model in real-world clinical settings could reduce the manual coding workload, enhance the efficiency of clinical coding, and support physicians in making better clinical decisions.

Keywords: clinical note; convolutional neural network; diagnosis codes; medication lists; natural language processing.

Grants and funding

This research was funded by the Ministry of Science and Technology, Taiwan (grant number 106-2634-F-038-002, 108-2314-B-038-053-MY3) to J.H.B.M., C.-C.K., and M.-C.L., and supported from Taipei Medical University, Taiwan (learning hospital project in Wan Fang hospital) to M.-C.L.