Dynamic time warping in phoneme modeling for fast pronunciation error detection

Comput Biol Med. 2016 Feb 1:69:277-85. doi: 10.1016/j.compbiomed.2015.12.004. Epub 2015 Dec 17.

Abstract

The presented paper describes a novel approach to the detection of pronunciation errors. It makes use of the modeling of well-pronounced and mispronounced phonemes by means of the Dynamic Time Warping (DTW) algorithm. Four approaches that make use of the DTW phoneme modeling were developed to detect pronunciation errors: Variations of the Word Structure (VoWS), Normalized Phoneme Distances Thresholding (NPDT), Furthest Segment Search (FSS) and Normalized Furthest Segment Search (NFSS). The performance evaluation of each module was carried out using a speech database of correctly and incorrectly pronounced words in the Polish language, with up to 10 patterns of every trained word from a set of 12 words having different phonetic structures. The performance of DTW modeling was compared to Hidden Markov Models (HMM) that were used for the same four approaches (VoWS, NPDT, FSS, NFSS). The average error rate (AER) was the lowest for DTW with NPDT (AER=0.287) and scored better than HMM with FSS (AER=0.473), which was the best result for HMM. The DTW modeling was faster than HMM for all four approaches. This technique can be used for computer-assisted pronunciation training systems that can work with a relatively small training speech corpus (less than 20 patterns per word) to support speech therapy at home.

Keywords: CAPT systems; DTW algorithm; Phoneme modeling; Pronunciation error detection; Word structure analysis.

MeSH terms

  • Algorithms*
  • Models, Biological*
  • Phonetics*
  • Poland
  • Speech Disorders / diagnosis*