Dance emotion recognition based on linear predictive Meir frequency cepstrum coefficient and bidirectional long short-term memory from robot environment

Dianhuai Shen; Xiaoxi Qiu; Xueying Jiang; Dan Wang

doi:10.3389/fnbot.2022.1067729

Dance emotion recognition based on linear predictive Meir frequency cepstrum coefficient and bidirectional long short-term memory from robot environment

Front Neurorobot. 2022 Nov 11:16:1067729. doi: 10.3389/fnbot.2022.1067729. eCollection 2022.

Authors

Dianhuai Shen¹, Xiaoxi Qiu², Xueying Jiang³, Dan Wang⁴

Affiliations

¹ College of Music and Dance, Huaqiao University, Xiamen, China.
² College of Education, Xiamen Nanyang Vocational College, Xiamen, China.
³ School of Public Policy and Management, Tsinghua University, Beijing, China.
⁴ Department of Computer Science, Heilongjiang University of Science and Technology, Harbin, China.

Abstract

Dance emotion recognition is an important research direction of automatic speech recognition, especially in the robot environment. It is an important research content of dance emotion recognition to extract the features that best represent speech emotion and to construct an acoustic model with strong robustness and generalization. The dance emotion data set is small in size and high in dimension. The traditional recurrent neural network (RNN) has the problem of long-range dependence disappearance, and due to the focus on local information of convolutional neural network (CNN), the mining of potential relationships between frames in the input sequence is insufficient. To solve the above problems, this paper proposes a novel linear predictive Meir frequency cepstrum coefficient and bidirectional long short-term memory (LSTM) for dance emotion recognition. In this paper, the linear prediction coefficient (LPC) and Meier frequency cepstrum coefficient (MFCC) are combined to obtain a new feature, namely the linear prediction Meier frequency cepstrum coefficient (LPMFCC). Then, the combined feature obtained by combining LPMFCC with energy feature is used as the extracted dance feature. The extracted features are input into the bidirectional LSTM network for training. Finally, support vector machine (SVM) is used to classify the obtained features through the full connection layer. Finally, we conduct experiments on public data sets and obtain the better effectiveness compared with the state-of-art dance motion recognition methods.

Keywords: Meier frequency cepstrum coefficient; SVM; bidirectional long short-term memory; dance emotion recognition; linear prediction coefficient; robot environment.