Supervised Determined Source Separation with Multichannel Variational Autoencoder

Hirokazu Kameoka; Li Li; Shota Inoue; Shoji Makino

doi:10.1162/neco_a_01217

Supervised Determined Source Separation with Multichannel Variational Autoencoder

Neural Comput. 2019 Sep;31(9):1891-1914. doi: 10.1162/neco_a_01217. Epub 2019 Jul 23.

Authors

Hirokazu Kameoka¹, Li Li², Shota Inoue³, Shoji Makino⁴

Affiliations

¹ Nippon Telegraph and Telephone Corporation, Kanagawa, 243-0198, Japan hirokazu.kameoka.uh@hco.ntt.co.jp.
² University of Tsukuba, Ibaraki, 305-8577, Japan lili@mmlab.cs.tsukuba.ac.jp.
³ University of Tsukuba, Ibaraki, 305-8577, Japan s1920622@s.tsukuba.ac.jp.
⁴ University of Tsukuba, Ibaraki, 305-8577, Japan maki@tara.tsukuba.ac.jp.

PMID: 31335290
DOI: 10.1162/neco_a_01217

Abstract

This letter proposes a multichannel source separation technique, the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class labels, we can use the trained decoder distribution as a universal generative model capable of generating spectrograms conditioned on a specified class index. By treating the latent space variables and the class index as the unknown parameters of this generative model, we can develop a convergence-guaranteed algorithm for supervised determined source separation that consists of iteratively estimating the power spectrograms of the underlying sources, as well as the separation matrices. In experimental evaluations, our MVAE produced better separation performance than a baseline method.

Publication types

Research Support, Non-U.S. Gov't