FET-LM: Flow-Enhanced Variational Autoencoder for Topic-Guided Language Modeling

Haoqin Tu; Zhongliang Yang; Jinshuai Yang; Linna Zhou; Yongfeng Huang

doi:10.1109/TNNLS.2023.3249253

FET-LM: Flow-Enhanced Variational Autoencoder for Topic-Guided Language Modeling

IEEE Trans Neural Netw Learn Syst. 2023 Mar 8:PP. doi: 10.1109/TNNLS.2023.3249253. Online ahead of print.

Authors

Haoqin Tu, Zhongliang Yang, Jinshuai Yang, Linna Zhou, Yongfeng Huang

PMID: 37028337
DOI: 10.1109/TNNLS.2023.3249253

Abstract

Variational autoencoder (VAE) is widely used in tasks of unsupervised text generation due to its potential of deriving meaningful latent spaces, which, however, often assumes that the distribution of texts follows a common yet poor-expressed isotropic Gaussian. In real-life scenarios, sentences with different semantics may not follow simple isotropic Gaussian. Instead, they are very likely to follow a more intricate and diverse distribution due to the inconformity of different topics in texts. Considering this, we propose a flow-enhanced VAE for topic-guided language modeling (FET-LM). The proposed FET-LM models topic and sequence latent separately, and it adopts a normalized flow composed of householder transformations for sequence posterior modeling, which can better approximate complex text distributions. FET-LM further leverages a neural latent topic component by considering learned sequence knowledge, which not only eases the burden of learning topic without supervision but also guides the sequence component to coalesce topic information during training. To make the generated texts more correlative to topics, we additionally assign the topic encoder to play the role of a discriminator. Encouraging results on abundant automatic metrics and three generation tasks demonstrate that the FET-LM not only learns interpretable sequence and topic representations but also is fully capable of generating high-quality paragraphs that are semantically consistent.