Latent Dirichlet allocation based generative adversarial networks

Lili Pan; Shen Cheng; Jian Liu; Peijun Tang; Bowen Wang; Yazhou Ren; Zenglin Xu

doi:10.1016/j.neunet.2020.08.012

Latent Dirichlet allocation based generative adversarial networks

Neural Netw. 2020 Dec:132:461-476. doi: 10.1016/j.neunet.2020.08.012. Epub 2020 Sep 21.

Authors

Lili Pan¹, Shen Cheng², Jian Liu³, Peijun Tang², Bowen Wang², Yazhou Ren³, Zenglin Xu⁴

Affiliations

¹ School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; SMILE Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
² School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
³ SMILE Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
⁴ School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Shenzhen, 510085, China; Center for Artificial Intelligence, Peng Cheng Lab, Shenzhen, 510085, China; SMILE Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China. Electronic address: panlili8255@gmail.com.

PMID: 33039785
DOI: 10.1016/j.neunet.2020.08.012

Abstract

Generative adversarial networks have been extensively studied in recent years and powered a wide range of applications, ranging from image generation, image-to-image translation, to text-to-image generation, and visual recognition. These methods typically model the mapping from latent space to image with single or multiple generators. However, they have obvious drawbacks: (i) ignoring the multi-modal structure of images, and (ii) lacking model interpretability. Importantly, the existing methods mostly assume one or more generators can cover all image modes even if we do not know the structure of data. Thus, mode dropping and collapse often take place along with GANs training. Despite the importance of exploring the data structure in generation, it has been almost unexplored. In this work, aiming at generating multi-modal images and interpreting model explicitly, we explore the theory on how to integrate GANs with data structure prior, and propose latent Dirichlet allocation based generative adversarial networks (LDAGAN). This framework is extended to combine with a variety of state-of-the-art single-generator GANs and achieves improved performance. Extensive experiments on synthetic and real datasets demonstrate the efficacy of LDAGAN for multi-modal image generation. An implementation of LDAGAN is available at https://github.com/Sumching/LDAGAN.

Keywords: Generative adversarial networks (GANs); Latent Dirichlet allocation (LDA); Model interpretability; Multi-modal structure prior.

MeSH terms

Image Processing, Computer-Assisted / methods*
Neural Networks, Computer*