Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images

Kele Xu; Pierre Roussel; Tamás Gábor Csapó; Bruce Denby

doi:10.1121/1.4984122

Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images

J Acoust Soc Am. 2017 Jun;141(6):EL531. doi: 10.1121/1.4984122.

Authors

Kele Xu¹, Pierre Roussel², Tamás Gábor Csapó³, Bruce Denby⁴

Affiliations

¹ Department of Engineering, Université Pierre et Marie Curie, Paris 75005, France kelele.xu@gmail.com.
² Langevin Institute, ESPCI-ParisTech, Paris 75005, France pierre.roussel@espci.fr.
³ Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary csapot@tmit.bme.hu.
⁴ Tianjin University, Tianjin, 300000 China bruce.denby@upmc.fr.

PMID: 28618815
DOI: 10.1121/1.4984122

Abstract

Tongue gestural target classification is of great interest to researchers in the speech production field. Recently, deep convolutional neural networks (CNN) have shown superiority to standard feature extraction techniques in a variety of domains. In this letter, both CNN-based speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted to classify tongue gestures during natural speech production. The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN (with the exception of a data augmentation preprocessing) was carried out.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biomechanical Phenomena
Deep Learning
Female
Gestures*
Humans
Male
Neural Networks, Computer*
Pattern Recognition, Automated
Signal Processing, Computer-Assisted*
Speech Acoustics*
Tongue / diagnostic imaging*
Tongue / physiology*
Ultrasonography / methods*
Voice Quality*