Least kth-Order and Rényi Generative Adversarial Networks

Himesh Bhatia; William Paul; Fady Alajaji; Bahman Gharesifard; Philippe Burlina

doi:10.1162/neco_a_01416

Least kth-Order and Rényi Generative Adversarial Networks

Neural Comput. 2021 Aug 19;33(9):2473-2510. doi: 10.1162/neco_a_01416.

Authors

Himesh Bhatia¹, William Paul², Fady Alajaji³, Bahman Gharesifard⁴, Philippe Burlina⁵

Affiliations

¹ Department of Mathematics and Statistics, Queens University, ON K7L 3N6, Canada himesh.bhatia@queensu.ca.
² Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, U.S.A. william.paul@jhuapl.edu.
³ Department of Mathematics and Statistics, Queens University, ON K7L 3N6, Canada fa@queensu.ca.
⁴ Department of Mathematics and Statistics, Queens University, ON K7L 3N6, Canada bahman.gharesifard@queensu.ca.
⁵ Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, and Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, U.S.A. philippe.burlina@jhuapl.edu.

PMID: 34412112
DOI: 10.1162/neco_a_01416

Abstract

We investigate the use of parameterized families of information-theoretic measures to generalize the loss functions of generative adversarial networks (GANs) with the objective of improving performance. A new generator loss function, least kth-order GAN (LkGAN), is introduced, generalizing the least squares GANs (LSGANs) by using a kth-order absolute error distortion measure with k≥1 (which recovers the LSGAN loss function when k=2). It is shown that minimizing this generalized loss function under an (unconstrained) optimal discriminator is equivalent to minimizing the kth-order Pearson-Vajda divergence. Another novel GAN generator loss function is next proposed in terms of Rényi cross-entropy functionals with order α>0, α≠1. It is demonstrated that this Rényi-centric generalized loss function, which provably reduces to the original GAN loss function as α→1, preserves the equilibrium point satisfied by the original GAN based on the Jensen-Rényi divergence, a natural extension of the Jensen-Shannon divergence. Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA data sets, under both DCGAN and StyleGAN architectures, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters k and α, respectively. More specifically, experiments show improvements with regard to the quality of the generated images as measured by the Fréchet inception distance score and training stability. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, for example, the issues of fairness or privacy in artificial intelligence.