A Decoder-Free Variational Deep Embedding for Unsupervised Clustering

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5681-5693. doi: 10.1109/TNNLS.2021.3071275. Epub 2022 Oct 5.

Abstract

In deep clustering frameworks, autoencoder (AE)- or variational AE-based clustering approaches are the most popular and competitive ones that encourage the model to obtain suitable representations and avoid the tendency for degenerate solutions simultaneously. However, for the clustering task, the decoder for reconstructing the original input is usually useless when the model is finished training. The encoder-decoder architecture limits the depth of the encoder so that the learning capacity is reduced severely. In this article, we propose a decoder-free variational deep embedding for unsupervised clustering (DFVC). It is well known that minimizing reconstruction error amounts to maximizing a lower bound on the mutual information (MI) between the input and its representation. That provides a theoretical guarantee for us to discard the bloated decoder. Inspired by contrastive self-supervised learning, we can directly calculate or estimate the MI of the continuous variables. Specifically, we investigate unsupervised representation learning by simultaneously considering the MI estimation of continuous representations and the MI computation of categorical representations. By introducing the data augmentation technique, we incorporate the original input, the augmented input, and their high-level representations into the MI estimation framework to learn more discriminative representations. Instead of matching to a simple standard normal distribution adversarially, we use end-to-end learning to constrain the latent space to be cluster-friendly by applying the Gaussian mixture distribution as the prior. Extensive experiments on challenging data sets show that our model achieves higher performance over a wide range of state-of-the-art clustering approaches.