Bio-inspired evolutionary oral tract shape modeling for physical modeling vocal synthesis

David M Howard; Andy M Tyrrell; Damian T Murphy; Crispin Cooper; Jack Mullen

doi:10.1016/j.jvoice.2007.03.003

Bio-inspired evolutionary oral tract shape modeling for physical modeling vocal synthesis

J Voice. 2009 Jan;23(1):11-20. doi: 10.1016/j.jvoice.2007.03.003. Epub 2007 Nov 5.

Authors

David M Howard¹, Andy M Tyrrell, Damian T Murphy, Crispin Cooper, Jack Mullen

Affiliation

¹ Intelligent Systems Research Group, Department of Electronics, University of York, Heslington, York, United Kingdom. dh@ohm.york.ac.uk

PMID: 17981014
DOI: 10.1016/j.jvoice.2007.03.003

Abstract

Physical modeling using digital waveguide mesh (DWM) models is an audio synthesis method that has been shown to produce an acoustic output in music synthesis applications that is often described as being "organic," "warm," or "intimate." This paper describes work that takes its inspiration from physical modeling music synthesis and applies it to speech synthesis through a physical modeling mesh model of the human oral tract. Oral tract shapes are found using a computational technique based on the principles of biological evolution. Essential to successful speech synthesis using this method is accurate measurements of the cross-sectional area of the human oral tract, and these are usually derived from magnetic resonance imaging (MRI). However, such images are nonideal, because of the lengthy exposure time (relative to the time of articulation of speech sounds) required, the local ambient acoustic noise associated with the MRI machine itself and the required supine position for the subject. An alternative method is described where a bio-inspired computing technique that simulates the process of evolution is used to evolve oral tract shapes. This technique is able to produce appropriate oral tract shapes for open vowels using acoustic and excitation data from two adult males and two adult females, but shapes for close vowels that are less appropriate. This technique has none of the drawbacks associated with MRI, because all it requires from the subject is an acoustic and electrolaryngograph (or electroglottograph) recording. Appropriate oral tract shapes do enable the model to produce excellent quality synthetic speech for vowel sounds, and sounds that involve dynamic oral tract shape changes, such as diphthongs, can also be synthesized using an impedance mapped technique. Efforts to improve performance by reducing mesh quantization for close vowels had little effect, and further work is required.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Biological Evolution
Computer Simulation
Female
Humans
Larynx / anatomy & histology*
Male
Models, Biological*
Speech Acoustics*