AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

Enrico Varano; Pierre Guilleminot; Tobias Reichenbach

doi:10.1121/10.0019460

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception

J Acoust Soc Am. 2023 May 1;153(5):3130. doi: 10.1121/10.0019460.

Authors

Enrico Varano¹, Pierre Guilleminot¹, Tobias Reichenbach²

Affiliations

¹ Department of Bioengineering and Centre for Neurotechnology, Imperial College London, London, United Kingdom.
² Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.

PMID: 37249407
DOI: 10.1121/10.0019460

Abstract

Seeing a speaker's face can help substantially with understanding their speech, particularly in challenging listening conditions. Research into the neurobiological mechanisms behind audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here, we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 h of audiovisual recordings of two speakers, one male and one female, each reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames/s. The corpus includes phone-level alignment files and a set of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is publicly available to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Auditory Perception
Female
Humans
Male
Pilot Projects
Speech
Speech Perception*
Visual Perception