Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning

Chunwu Yin; Zhanbo Chen

doi:10.3390/healthcare8030291

Developing Sustainable Classification of Diseases via Deep Learning and Semi-Supervised Learning

Healthcare (Basel). 2020 Aug 24;8(3):291. doi: 10.3390/healthcare8030291.

Authors

Chunwu Yin¹, Zhanbo Chen^{2

3}

Affiliations

¹ School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China.
² School of Information and Statistics, Guangxi University of Finance and Economics, Nanning 530003, China.
³ Center of Guangxi Cooperative Innovation for Education Performance Assessment, Guangxi University of Finance and Economics, Nanning 530003, China.

Abstract

Disease classification based on machine learning has become a crucial research topic in the fields of genetics and molecular biology. Generally, disease classification involves a supervised learning style; i.e., it requires a large number of labelled samples to achieve good classification performance. However, in the majority of the cases, labelled samples are hard to obtain, so the amount of training data are limited. However, many unclassified (unlabelled) sequences have been deposited in public databases, which may help the training procedure. This method is called semi-supervised learning and is very useful in many applications. Self-training can be implemented using high- to low-confidence samples to prevent noisy samples from affecting the robustness of semi-supervised learning in the training process. The deep forest method with the hyperparameter settings used in this paper can achieve excellent performance. Therefore, in this work, we propose a novel combined deep learning model and semi-supervised learning with self-training approach to improve the performance in disease classification, which utilizes unlabelled samples to update a mechanism designed to increase the number of high-confidence pseudo-labelled samples. The experimental results show that our proposed model can achieve good performance in disease classification and disease-causing gene identification.

Keywords: deep learning; disease classification; self-training; semi-supervised learning.

Grants and funding

19BTJ053/National Social Science Fund of China