Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data

Comput Biol Med. 2021 Aug:135:104540. doi: 10.1016/j.compbiomed.2021.104540. Epub 2021 Jun 12.

Abstract

Background and objective: Cancer is a serious global disease due to its high mortality, and the key to effective treatment is accurate diagnosis. However, limited by sampling difficulty and actual sample size in clinical practice, data imbalance is a common problem in cancer diagnosis, while most conventional classification methods assume balanced data distribution. Therefore, addressing the imbalanced learning problem to improve the predictive performance of cancer diagnosis is significant.

Methods: In the study, we dissect the data imbalance prevalent in cancer gene expression data and present an improved deep learning based Wasserstein generative adversarial network (WGAN) model, which provides a reliable training progress indicator and deeply explores the characteristics of data. The WGAN generates new samples from the minority class and solves the imbalance problem at the data level.

Results: We analyze three publicly available data sets on RNA-seq of three kinds of cancer using the proposed WGAN and compare the results with those from two commonly adopted sampling methods. According to the results, through addressing the data imbalance problem, the balanced data distribution and the expanding sample size increase the prediction accuracy in all three data sets.

Conclusions: Therefore, the proposed WGAN method is superior in solving the imbalanced learning problem of gene expression data, providing significantly better prediction performance in cancer diagnosis.

Keywords: Cancer diagnosis; Deep learning; Gene expression data; Imbalanced data; Wasserstein generative adversarial networks.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Neoplasms* / diagnosis
  • Neoplasms* / genetics