Leveraging deep learning algorithms for synthetic data generation to design and analyze biological networks

J Biosci. 2022:47:43.

Abstract

The use of synthetic data is gaining an increasingly prominent role in data and machine learning workflows to build better models and conduct analyses with greater statistical inference. In the domains of healthcare and biomedical research, synthetic data may be seen in structured and unstructured formats. Concomitant with the adoption of synthetic data, a sub-discipline of machine learning known as deep learning has taken the world by storm. At a larger scale, deep learning methods tend to outperform traditional methods in regression and classification tasks. These techniques are also used in generative modeling and are thus prime candidates for generating synthetic data in both structured and unstructured formats. Here, we emphasize the generation of synthetic data in healthcare and biomedical research using deep learning methods for unstructured data formats such as text and images. Deep learning methods leverage the neural network algorithm, and in the context of generative modeling, several neural network architectures can create new synthetic data for a problem at hand including, but not limited to, recurrent neural networks (RNNs), variational autoencoders (VAEs), and generative adversarial networks (GANs). To better understand these methods, we will look at specific case studies such as generating realistic clinical notes of a patient, the generation of synthetic DNA sequences, as well as to enrich experimental data collected during the study of heterotypic cultures of cancer cells.

MeSH terms

  • Algorithms
  • Deep Learning*
  • Humans
  • Machine Learning
  • Neural Networks, Computer