A constraint-based evolutionary learning approach to the expectation maximization for optimal estimation of the hidden Markov model for speech signal modeling

Shamsul Huda; John Yearwood; Roberto Togneri

doi:10.1109/TSMCB.2008.2004051

A constraint-based evolutionary learning approach to the expectation maximization for optimal estimation of the hidden Markov model for speech signal modeling

IEEE Trans Syst Man Cybern B Cybern. 2009 Feb;39(1):182-97. doi: 10.1109/TSMCB.2008.2004051. Epub 2008 Dec 9.

Authors

Shamsul Huda¹, John Yearwood, Roberto Togneri

Affiliation

¹ Center for Informatics and Applied Optimization, University of Ballarat, Ballarat, Vic. 3350, Australia. shuda@ballarat.edu.au

PMID: 19068441
DOI: 10.1109/TSMCB.2008.2004051

Abstract

This paper attempts to overcome the tendency of the expectation-maximization (EM) algorithm to locate a local rather than global maximum when applied to estimate the hidden Markov model (HMM) parameters in speech signal modeling. We propose a hybrid algorithm for estimation of the HMM in automatic speech recognition (ASR) using a constraint-based evolutionary algorithm (EA) and EM, the CEL-EM. The novelty of our hybrid algorithm (CEL-EM) is that it is applicable for estimation of the constraint-based models with many constraints and large numbers of parameters (which use EM) like HMM. Two constraint-based versions of the CEL-EM with different fusion strategies have been proposed using a constraint-based EA and the EM for better estimation of HMM in ASR. The first one uses a traditional constraint-handling mechanism of EA. The other version transforms a constrained optimization problem into an unconstrained problem using Lagrange multipliers. Fusion strategies for the CEL-EM use a staged-fusion approach where EM has been plugged with the EA periodically after the execution of EA for a specific period of time to maintain the global sampling capabilities of EA in the hybrid algorithm. A variable initialization approach (VIA) has been proposed using a variable segmentation to provide a better initialization for EA in the CEL-EM. Experimental results on the TIMIT speech corpus show that CEL-EM obtains higher recognition accuracies than the traditional EM algorithm as well as a top-standard EM (VIA-EM, constructed by applying the VIA to EM).

MeSH terms

Algorithms
Artificial Intelligence*
Humans
Markov Chains*
Models, Statistical
Normal Distribution
Pattern Recognition, Automated / methods*
Reproducibility of Results
Speech Recognition Software*