Emergent communication of multimodal deep generative models based on Metropolis-Hastings naming game

Nguyen Le Hoang; Tadahiro Taniguchi; Yoshinobu Hagiwara; Akira Taniguchi

doi:10.3389/frobt.2023.1290604

Emergent communication of multimodal deep generative models based on Metropolis-Hastings naming game

Front Robot AI. 2024 Jan 31:10:1290604. doi: 10.3389/frobt.2023.1290604. eCollection 2023.

Authors

Nguyen Le Hoang¹, Tadahiro Taniguchi², Yoshinobu Hagiwara³, Akira Taniguchi²

Affiliations

¹ Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan.
² College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan.
³ Research Organization of Science and Technology, Ritsumeikan University, Kusatsu, Shiga, Japan.

Abstract

Deep generative models (DGM) are increasingly employed in emergent communication systems. However, their application in multimodal data contexts is limited. This study proposes a novel model that combines multimodal DGM with the Metropolis-Hastings (MH) naming game, enabling two agents to focus jointly on a shared subject and develop common vocabularies. The model proves that it can handle multimodal data, even in cases of missing modalities. Integrating the MH naming game with multimodal variational autoencoders (VAE) allows agents to form perceptual categories and exchange signs within multimodal contexts. Moreover, fine-tuning the weight ratio to favor a modality that the model could learn and categorize more readily improved communication. Our evaluation of three multimodal approaches - mixture-of-experts (MoE), product-of-experts (PoE), and mixture-of-product-of-experts (MoPoE)-suggests an impact on the creation of latent spaces, the internal representations of agents. Our results from experiments with the MNIST + SVHN and Multimodal165 datasets indicate that combining the Gaussian mixture model (GMM), PoE multimodal VAE, and MH naming game substantially improved information sharing, knowledge formation, and data reconstruction.

Keywords: Metropolis-Hastings; deep generative model; emergent communication; multimodal; naming game; symbol emergence; variational autoencoder.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Establishment of University Fellowships Towards the Creation of Science Technology Innovation Grant Number JPMJFS2146. This work was also supported by JSPS KAKENHI Grant Numbers JP21H04904 and JP23H04835.