A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Yongwan Lim; Asterios Toutios; Yannick Bliesener; Ye Tian; Sajan Goud Lingala; Colin Vaz; Tanner Sorensen; Miran Oh; Sarah Harper; Weiyi Chen; Yoonjeong Lee; Johannes Töger; Mairym Lloréns Monteserin; Caitlin Smith; Bianca Godinez; Louis Goldstein; Dani Byrd; Krishna S Nayak; Shrikanth S Narayanan

doi:10.1038/s41597-021-00976-x

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.

Authors

Yongwan Lim^#¹, Asterios Toutios^#¹, Yannick Bliesener¹, Ye Tian¹, Sajan Goud Lingala¹, Colin Vaz¹, Tanner Sorensen², Miran Oh², Sarah Harper², Weiyi Chen¹, Yoonjeong Lee², Johannes Töger¹, Mairym Lloréns Monteserin², Caitlin Smith², Bianca Godinez³, Louis Goldstein², Dani Byrd², Krishna S Nayak¹, Shrikanth S Narayanan^{4

5}

Affiliations

¹ Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA.
² Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA.
³ Department of Linguistics, California State University Long Beach, Long Beach, California, USA.
⁴ Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA. shri@sipi.usc.edu.
⁵ Department of Linguistics, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, USA. shri@sipi.usc.edu.

^# Contributed equally.

Abstract

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.

Publication types

Dataset
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Adolescent
Adult
Computer Systems
Female
Humans
Larynx / physiology*
Magnetic Resonance Imaging / methods*
Male
Middle Aged
Speech*
Time Factors
Video Recording
Young Adult

Abstract

Publication types

MeSH terms

Grants and funding