Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation

Int J Neural Syst. 2016 Sep;26(6):1650023. doi: 10.1142/S0129065716500234. Epub 2016 Mar 29.

Abstract

This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.

Keywords: Impaired speech; artificial neural network; compact differential evolution; metaheuristics; steady state; support vector machine.

Publication types

  • Comparative Study

MeSH terms

  • Female
  • Humans
  • Male
  • Neural Networks, Computer*
  • Speech
  • Speech Disorders / diagnosis
  • Speech Disorders / physiopathology
  • Speech Disorders / therapy
  • Speech Production Measurement / methods*
  • Speech Therapy / methods*
  • Support Vector Machine*