Improved bioimpedance spectroscopy tissue classification through data augmentation from generative adversarial networks

Med Biol Eng Comput. 2024 Apr;62(4):1177-1189. doi: 10.1007/s11517-023-03006-7. Epub 2023 Dec 29.

Abstract

Bioimpedance spectroscopy is a tissue classification technique with many clinical applications. Similarly to other data-driven methods, it requires large amounts of data to accurately distinguish similar classes of tissue. Classifiers trained on small datasets typically suffer from over-fitting and lack the ability to generalise to previously unseen data. However, a large in or ex vivo spectral database is difficult to attain. Data collection is usually limited to studies that occur infrequently, and publicly available data is often not available. A solution to this problem is to artificially increase the training dataset by creating modified, yet accurate, copies of the original dataset. The most common techniques in spectral classification are to add noise to copies of the original data, over-sample it, or randomly interpolate pairs of the original data. However, simply perturbing or interpolating the data does not guarantee that the new dataset captures the key features of the original data needed for accurate classification. This study proposes a novel way to augment bioimpedance spectral data. It uses generative adversarial networks (GAN)-a model in which two neural networks (NN) compete with each other: while one NN artificially manufactures data that could be mistaken for real data, the role of the second NN is to identify which data it receives has been artificially created. The first NN then interactively adapts its output until the second NN can no longer flag artificially created data. The result is a new dataset that truly represents the features of the original data. In this study, three GAN architectures are used, i.e., the vanilla GAN, the deep convolutional GAN, and the Wasserstein GAN. Then, the generated data is used to train five classification methods, and their results are compared to a baseline that only uses the original data. The results from a dataset of 13 different tissue classes show that the deep convolutional GAN is most statistically similar to the original data and improves classification accuracy by 15% when compared to the same model trained only on the original data. The Wasserstein-GAN architecture also provides significant improvements of up to 24% better accuracy.

Keywords: Clinical tools; Data augmentation; Electrical impedance spectroscopy; Generative adversarial network.

MeSH terms

  • Data Collection
  • Databases, Factual
  • Neural Networks, Computer*

Grants and funding