Modulation spectrum-constrained trajectory error training for mixture density network-based speech synthesis

Sangjun Park; Minsoo Hahn

doi:10.1121/1.5052206

Modulation spectrum-constrained trajectory error training for mixture density network-based speech synthesis

J Acoust Soc Am. 2018 Sep;144(3):EL151. doi: 10.1121/1.5052206.

Authors

Sangjun Park¹, Minsoo Hahn¹

Affiliation

¹ School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea psj@kaist.ac.kr, mshahn2@kaist.ac.kr.

PMID: 30424621
DOI: 10.1121/1.5052206

Abstract

In statistical parametric speech synthesis, a mixture density network is employed to address the limitations of a linear output layer such as pre-computed fixed variances and the unimodal assumption. However, it also has a defect, i.e., it cannot deploy a static-dynamic constraint needed in the training phase for high-quality speech synthesis. To cope with this problem, this paper proposes a training algorithm based on the minimum trajectory error for a mixture density network. And a modulation spectrum-constrained loss function is also proposed to alleviate the over-smoothing effect. The experimental results confirm meaningful improvement both in objective and subjective performance measures.

Publication types

Research Support, Non-U.S. Gov't