A3CarScene: An audio-visual dataset for driving scene understanding

Michela Cantarini; Leonardo Gabrielli; Adriano Mancini; Stefano Squartini; Roberto Longo

doi:10.1016/j.dib.2023.109146

A3CarScene: An audio-visual dataset for driving scene understanding

Data Brief. 2023 Apr 12:48:109146. doi: 10.1016/j.dib.2023.109146. eCollection 2023 Jun.

Authors

Michela Cantarini¹, Leonardo Gabrielli¹, Adriano Mancini¹, Stefano Squartini¹, Roberto Longo^{2

3}

Affiliations

¹ Department of Information Engineering, Università Politecnica delle Marche, via Brecce Bianche 12, 60131 Ancona, Italy.
² Groupe Signal Image et Instrumentation (GSII), École Supérieure d'Électronique de l'Ouest (ESEO), 10 Bd Jeanneteau, 49107 Angers, France.
³ Laboratoire d'Acoustique de l'Université du Mans (LAUM), UMR 6613, Institut d'Acoustique - Graduate School (IA-GS), CNRS, Le Mans Université, Av. Olivier Messiaen, 72085 Le Mans, France.

Abstract

Accurate perception and awareness of the environment surrounding the automobile is a challenge in automotive research. This article presents A3CarScene, a dataset recorded while driving a research vehicle equipped with audio and video sensors on public roads in the Marche Region, Italy. The sensor suite includes eight microphones installed inside and outside the passenger compartment and two dashcams mounted on the front and rear windows. Approximately 31 h of data for each device were collected during October and November 2022 by driving about 1500 km along diverse roads and landscapes, in variable weather conditions, in daytime and nighttime hours. All key information for the scene understanding process of automated vehicles has been accurately annotated. For each route, annotations with beginning and end timestamps report the type of road traveled (motorway, trunk, primary, secondary, tertiary, residential, and service roads), the degree of urbanization of the area (city, town, suburban area, village, exurban and rural areas), the weather conditions (clear, cloudy, overcast, and rainy), the level of lighting (daytime, evening, night, and tunnel), the type (asphalt or cobblestones) and moisture status (dry or wet) of the road pavement, and the state of the windows (open or closed). This large-scale dataset is valuable for developing new driving assistance technologies based on audio or video data alone or in a multimodal manner and for improving the performance of systems currently in use. The data acquisition process with sensors in multiple locations allows for the assessment of the best installation placement concerning the task. Deep learning engineers can use this dataset to build new baselines, as a comparative benchmark, and to extend existing databases for autonomous driving.

Keywords: Acoustic and visual scene classification; Advanced driver assistance systems; Artificial neural networks; Audio signal processing; Autonomous vehicles; Computer vision; Multimodal deep learning.