Modulation spectrum-constrained trajectory error training for mixture density network-based speech synthesis

J Acoust Soc Am. 2018 Sep;144(3):EL151. doi: 10.1121/1.5052206.

Abstract

In statistical parametric speech synthesis, a mixture density network is employed to address the limitations of a linear output layer such as pre-computed fixed variances and the unimodal assumption. However, it also has a defect, i.e., it cannot deploy a static-dynamic constraint needed in the training phase for high-quality speech synthesis. To cope with this problem, this paper proposes a training algorithm based on the minimum trajectory error for a mixture density network. And a modulation spectrum-constrained loss function is also proposed to alleviate the over-smoothing effect. The experimental results confirm meaningful improvement both in objective and subjective performance measures.

Publication types

  • Research Support, Non-U.S. Gov't