Evaluating the Performance of Pre-Trained Convolutional Neural Network for Audio Classification on Embedded Systems for Anomaly Detection in Smart Cities

Mimoun Lamrini; Mohamed Yassin Chkouri; Abdellah Touhafi

doi:10.3390/s23136227

Evaluating the Performance of Pre-Trained Convolutional Neural Network for Audio Classification on Embedded Systems for Anomaly Detection in Smart Cities

Sensors (Basel). 2023 Jul 7;23(13):6227. doi: 10.3390/s23136227.

Authors

Mimoun Lamrini^{1

2}, Mohamed Yassin Chkouri², Abdellah Touhafi^{1

3}

Affiliations

¹ Department of Engineering Sciences and Technology (INDI), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium.
² SIGL Laboratory, National School of Applied Sciences of Tetuan, Abdelmalek Essaadi University, Tetuan 93000, Morocco.
³ Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium.

Abstract

Environmental Sound Recognition (ESR) plays a crucial role in smart cities by accurately categorizing audio using well-trained Machine Learning (ML) classifiers. This application is particularly valuable for cities that analyzed environmental sounds to gain insight and data. However, deploying deep learning (DL) models on resource-constrained embedded devices, such as Raspberry Pi (RPi) or Tensor Processing Units (TPUs), poses challenges. In this work, an evaluation of an existing pre-trained model for deployment on Raspberry Pi (RPi) and TPU platforms other than a laptop is proposed. We explored the impact of the retraining parameters and compared the sound classification performance across three datasets: ESC-10, BDLib, and Urban Sound. Our results demonstrate the effectiveness of the pre-trained model for transfer learning in embedded systems. On laptops, the accuracy rates reached 96.6% for ESC-10, 100% for BDLib, and 99% for Urban Sound. On RPi, the accuracy rates were 96.4% for ESC-10, 100% for BDLib, and 95.3% for Urban Sound, while on RPi with Coral TPU, the rates were 95.7% for ESC-10, 100% for BDLib and 95.4% for the Urban Sound. Utilizing pre-trained models reduces the computational requirements, enabling faster inference. Leveraging pre-trained models in embedded systems accelerates the development, deployment, and performance of various real-time applications.

Keywords: deep learning; embedded system; environment sound recognition; pre-trained models.

MeSH terms

Cities
Machine Learning*
Neural Networks, Computer*
Sound

Grants and funding

This research received no external funding.