Combined spectral and speech features for pig speech recognition

Xuan Wu; Silong Zhou; Mingwei Chen; Yihang Zhao; Yifei Wang; Xianmeng Zhao; Danyang Li; Haibo Pu

doi:10.1371/journal.pone.0276778

Combined spectral and speech features for pig speech recognition

PLoS One. 2022 Dec 1;17(12):e0276778. doi: 10.1371/journal.pone.0276778. eCollection 2022.

Authors

Xuan Wu¹, Silong Zhou¹, Mingwei Chen¹, Yihang Zhao¹, Yifei Wang², Xianmeng Zhao¹, Danyang Li¹, Haibo Pu¹

Affiliations

¹ College of Information Engineering, Sichuan Agricultural University, Ya'an, Sichuan, China.
² Department of Economics, University of Calgary, Calgary, AB, Canada.

Abstract

The sound of the pig is one of its important signs, which can reflect various states such as hunger, pain or emotional state, and directly indicates the growth and health status of the pig. Existing speech recognition methods usually start with spectral features. The use of spectrograms to achieve classification of different speech sounds, while working well, may not be the best approach for solving such tasks with single-dimensional feature input. Based on the above assumptions, in order to more accurately grasp the situation of pigs and take timely measures to ensure the health status of pigs, this paper proposes a pig sound classification method based on the dual role of signal spectrum and speech. Spectrograms can visualize information about the characteristics of the sound under different time periods. The audio data are introduced, and the spectrogram features of the model input as well as the audio time-domain features are complemented with each other and passed into a pre-designed parallel network structure. The network model with the best results and the classifier were selected for combination. An accuracy of 93.39% was achieved on the pig speech classification task, while the AUC also reached 0.99163, demonstrating the superiority of the method. This study contributes to the direction of computer vision and acoustics by recognizing the sound of pigs. In addition, a total of 4,000 pig sound datasets in four categories are established in this paper to provide a research basis for later research scholars.

Copyright: © 2022 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Acoustics
Animals
Emotions
Sound
Speech Perception*
Speech*
Swine

Associated data

figshare/10.6084/m9.figshare.16940389

Grants and funding

The author(s) received no specific funding for this work.