Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network

Neural Netw. 2021 Apr:136:207-217. doi: 10.1016/j.neunet.2020.09.002. Epub 2020 Sep 12.

Abstract

Altering the content of an image with photo editing tools is a tedious task for an inexperienced user, especially, when modifying the visual attributes of a specific object in an image without affecting other constituents such as background etc. To simplify the process of image manipulation and to provide more control to users, it is better to utilize a simpler interface like natural language. It also enables to semantically modify parts of an image according to the given text. Therefore, in this paper, we address the challenge of manipulating images using natural language descriptions. We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images. TEA-cGAN's contribution is seen as two-fold. The first contribution aims to attend locations that need to be modified during generation. It introduces two types of architectures that provide fine-grained attention both in the generator and discriminator of Generative Adversarial Network (GAN). To be specific, the first one i.e., the Single-scale architecture used in the generator focuses to modify only the text-relevant regions in an image and leaves other regions untouched. While the second one i.e., Multi-scale architecture further extended this idea by taking the different scales of image features into account. The second contribution purpose is to generate higher resolution images (e.g., 256 × 256) as they provide better quality and stability. Quantitative and qualitative experiments conducted on CUB and Oxford-102 datasets confirm that TEA-cGAN different scale architectures outperform existing methods while generating 128 × 128 resolution images including generating higher resolution image i.e., 256 × 256.

Keywords: Generative Adversarial Network (GAN); Image manipulation; Text-to-image generation.

MeSH terms

  • Attention
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Natural Language Processing*
  • Neural Networks, Computer*