Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network

Aravind Illa; Prasanta Kumar Ghosh

doi:10.1121/10.0000738

Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network

J Acoust Soc Am. 2020 Feb;147(2):EL171. doi: 10.1121/10.0000738.

Authors

Aravind Illa¹, Prasanta Kumar Ghosh¹

Affiliation

¹ Electrical Engineering Department, Indian Institute of Science, Bangalore, 560012, Indiaaravindi@iisc.ac.in, prasantg@iisc.ac.in.

PMID: 32113264
DOI: 10.1121/10.0000738

Abstract

Estimating articulatory movements from speech acoustic representations is known as acoustic-to-articulatory inversion (AAI). In this work, a speaker conditioned AAI (SC AAI) is proposed using a bi-directional LSTM neural network, where training is performed by pooling acoustic-articulatory data from multiple speakers along with their corresponding speaker identity information. For this work, 7.24 h of multi-speaker acoustic-articulatory data are collected from 20 speakers speaking 460 English sentences. Experiments with 20 speakers indicate that the SC AAI model performs better than SD AAI model with an improvement of correlation coefficient by 0.036 (absolute) between the original and estimated articulatory movements.

Publication types

Research Support, Non-U.S. Gov't