Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

IEEE/ACM Trans Audio Speech Lang Process. 2023:31:86-95. doi: 10.1109/taslp.2022.3209937. Epub 2022 Oct 10.

Abstract

Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

Keywords: Articulation precision; cleft lip and palate; consonant-vowel transitions; convolution neural networks; dysarthria; pronunciation scores; second language learning.