The bag-of-frames approach: A not so sufficient model for urban soundscapes

Mathieu Lagrange; Grégoire Lafay; Boris Défréville; Jean-Julien Aucouturier

doi:10.1121/1.4935350

The bag-of-frames approach: A not so sufficient model for urban soundscapes

J Acoust Soc Am. 2015 Nov;138(5):EL487-92. doi: 10.1121/1.4935350.

Authors

Mathieu Lagrange¹, Grégoire Lafay¹, Boris Défréville², Jean-Julien Aucouturier³

Affiliations

¹ Institut de Recherche en Communications, Cybernetique de Nantes, Ecole Centrale de Nantes, Nantes, France mathieu.lagrange@cnrs.fr, gregoire.lafay@irccyn.ec-nantes.fr.
² ORELIA, Thomery, France boris.defreville@gmail.com.
³ Institut de Recherche et Coordination Acoustique/Musique, Sciences et Technologies de la Musique et du Son, Centre National de la Recherche Scientifique, Université Pierre et Marie Curie, Paris, France aucouturier@gmail.com.

PMID: 26627819
DOI: 10.1121/1.4935350

Abstract

The "bag-of-frames" (BOF) approach, which encodes audio signals as the long-term statistical distribution of short-term spectral features, is commonly regarded as an effective and sufficient way to represent environmental sound recordings (soundscapes). The present paper describes a conceptual replication of a use of the BOF approach in a seminal article using several other soundscape datasets, with results strongly questioning the adequacy of the BOF approach for the task. As demonstrated in this paper, the good accuracy originally reported with BOF likely resulted from a particularly permissive dataset with low within-class variability. Soundscape modeling, therefore, may not be the closed case it was once thought to be.

Publication types

Research Support, Non-U.S. Gov't