Audio-Visual Stress Classification Using Cascaded RNN-LSTM Networks

Megha V Gupta; Shubhangi Vaikole; Ankit D Oza; Amisha Patel; Diana Petronela Burduhos-Nergis; Dumitru Doru Burduhos-Nergis

doi:10.3390/bioengineering9100510

Audio-Visual Stress Classification Using Cascaded RNN-LSTM Networks

Bioengineering (Basel). 2022 Sep 27;9(10):510. doi: 10.3390/bioengineering9100510.

Authors

Megha V Gupta¹, Shubhangi Vaikole², Ankit D Oza³, Amisha Patel⁴, Diana Petronela Burduhos-Nergis⁵, Dumitru Doru Burduhos-Nergis⁵

Affiliations

¹ Department of Computer Engineering, New Horizon Institute of Technology and Management, University of Mumbai, Mumbai 400615, Maharashtra, India.
² Department of Computer Engineering, Datta Meghe College of Engineering, University of Mumbai, Mumbai 400708, Maharashtra, India.
³ Department of Computer Sciences and Engineering, Institute of Advanced Research, Gandhinagar 382426, Gujarat, India.
⁴ Department of Mathematics, Institute of Technology, Ahmedabad 382481, Gujarat, India.
⁵ Faculty of Materials Science and Engineering, Gheorghe Asachi Technical University of Iasi, 700050 Iasi, Romania.

Abstract

The purpose of this research is to emphasize the importance of mental health and contribute to the overall well-being of humankind by detecting stress. Stress is a state of strain, whether it be mental or physical. It can result from anything that frustrates, incenses, or unnerves you in an event or thinking. Your body's response to a demand or challenge is stress. Stress affects people on a daily basis. Stress can be regarded as a hidden pandemic. Long-term (chronic) stress results in ongoing activation of the stress response, which wears down the body over time. Symptoms manifest as behavioral, emotional, and physical effects. The most common method involves administering brief self-report questionnaires such as the Perceived Stress Scale. However, self-report questionnaires frequently lack item specificity and validity, and interview-based measures can be time- and money-consuming. In this research, a novel method used to detect human mental stress by processing audio-visual data is proposed. In this paper, the focus is on understanding the use of audio-visual stress identification. Using the cascaded RNN-LSTM strategy, we achieved 91% accuracy on the RAVDESS dataset, classifying eight emotions and eventually stressed and unstressed states.

Keywords: RNN-LSTM; action units; audio visual; emotion; speech; stress.

Grants and funding

FCSU-2022/Gheorghe Asachi Technical University of Iaşi-TUIASI- Romania, Scientific Research Funds