Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images

J Acoust Soc Am. 2017 Jun;141(6):EL531. doi: 10.1121/1.4984122.

Abstract

Tongue gestural target classification is of great interest to researchers in the speech production field. Recently, deep convolutional neural networks (CNN) have shown superiority to standard feature extraction techniques in a variety of domains. In this letter, both CNN-based speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted to classify tongue gestures during natural speech production. The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN (with the exception of a data augmentation preprocessing) was carried out.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomechanical Phenomena
  • Deep Learning
  • Female
  • Gestures*
  • Humans
  • Male
  • Neural Networks, Computer*
  • Pattern Recognition, Automated
  • Signal Processing, Computer-Assisted*
  • Speech Acoustics*
  • Tongue / diagnostic imaging*
  • Tongue / physiology*
  • Ultrasonography / methods*
  • Voice Quality*