A CNN Sound Classification Mechanism Using Data Augmentation

Hung-Chi Chu; Young-Lin Zhang; Hao-Chu Chiang

doi:10.3390/s23156972

A CNN Sound Classification Mechanism Using Data Augmentation

Sensors (Basel). 2023 Aug 5;23(15):6972. doi: 10.3390/s23156972.

Authors

Hung-Chi Chu¹, Young-Lin Zhang¹, Hao-Chu Chiang¹

Affiliation

¹ Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 41349, Taiwan.

Abstract

Sound classification has been widely used in many fields. Unlike traditional signal-processing methods, using deep learning technology for sound classification is one of the most feasible and effective methods. However, limited by the quality of the training dataset, such as cost and resource constraints, data imbalance, and data annotation issues, the classification performance is affected. Therefore, we propose a sound classification mechanism based on convolutional neural networks and use the sound feature extraction method of Mel-Frequency Cepstral Coefficients (MFCCs) to convert sound signals into spectrograms. Spectrograms are suitable as input for CNN models. To provide the function of data augmentation, we can increase the number of spectrograms by setting the number of triangular bandpass filters. The experimental results show that there are 50 semantic categories in the ESC-50 dataset, the types are complex, and the amount of data is insufficient, resulting in a classification accuracy of only 63%. When using the proposed data augmentation method (K = 5), the accuracy is effectively increased to 97%. Furthermore, in the UrbanSound8K dataset, the amount of data is sufficient, so the classification accuracy can reach 90%, and the classification accuracy can be slightly increased to 92% via data augmentation. However, when only 50% of the training dataset is used, along with data augmentation, the establishment of the training model can be accelerated, and the classification accuracy can reach 91%.

Keywords: CNN; signal processing; sound classification.

Grants and funding

This research was funded by the Chaoyang University of Technology, Taiwan, under grants 110F0021110.