Medical Specialty Classification Based on Semiadversarial Data Augmentation

Huan Zhang; Dong Zhu; Hao Tan; Muhammad Shafiq; Zhaoquan Gu

doi:10.1155/2023/4919371

Medical Specialty Classification Based on Semiadversarial Data Augmentation

Comput Intell Neurosci. 2023 Oct 17:2023:4919371. doi: 10.1155/2023/4919371. eCollection 2023.

Authors

Huan Zhang^{1

2}, Dong Zhu¹, Hao Tan^{1

2}, Muhammad Shafiq¹, Zhaoquan Gu^{2

3}

Affiliations

¹ Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China.
² Department of New Networks, Peng Cheng Laboratory, Shenzhen, China.
³ School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China.

Abstract

Rapidly increasing adoption of electronic health record (EHR) systems has caused automated medical specialty classification to become an important research field. Medical specialty classification not only improves EHR system retrieval efficiency and helps general practitioners identify urgent patient issues but also is useful in studying the practice and validity of clinical referral patterns. However, currently available medical note data are imbalanced and insufficient. In addition, medical specialty classification is a multicategory problem, and it is not easy to remove sensitive information from numerous medical notes and tag them. To solve those problems, we propose a data augmentation method based on adversarial attacks. The semiadversarial examples generated during the dynamic process of adversarial attacking are added to the training set as augmented examples, which can effectively expand the coverage of the training data on the decision space. Besides, as nouns in medical notes are critical information, we design a classification framework incorporating probabilistic information of nouns, with confidence recalculation after the softmax layer. We validate our proposed method on an 18-class dataset with extremely unbalanced data, and comparison experiments with four benchmarks show that our method improves accuracy and F1 score to the optimal level, by an average of 14.9%.

MeSH terms

Electronic Health Records*
Humans
Medicine*
Software