Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks

Andrés Eduardo Castro-Ospina; Miguel Angel Solarte-Sanchez; Laura Stella Vega-Escobar; Claudia Isaza; Juan David Martínez-Vargas

doi:10.3390/s24072106

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks

Sensors (Basel). 2024 Mar 26;24(7):2106. doi: 10.3390/s24072106.

Authors

Andrés Eduardo Castro-Ospina¹, Miguel Angel Solarte-Sanchez¹, Laura Stella Vega-Escobar¹, Claudia Isaza², Juan David Martínez-Vargas³

Affiliations

¹ Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia.
² SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia.
³ GIDITIC, Universidad EAFIT, Medellín 050022, Colombia.

Abstract

Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.

Keywords: ecoacoustics; environmental sound classification; graph neural networks; graph representation learning; node classification; pre-trained models.

Grants and funding

111585269779/Ministerio de Ciencia, Tecnología e Innovación