On the effectiveness of facial expression recognition for evaluation of urban sound perception

Qi Meng; Xuejun Hu; Jian Kang; Yue Wu

doi:10.1016/j.scitotenv.2019.135484

On the effectiveness of facial expression recognition for evaluation of urban sound perception

Sci Total Environ. 2020 Mar 25:710:135484. doi: 10.1016/j.scitotenv.2019.135484. Epub 2019 Nov 21.

Authors

Qi Meng¹, Xuejun Hu¹, Jian Kang², Yue Wu³

Affiliations

¹ Key Laboratory of Cold Region Urban and Rural Human Settlement Environment Science and Technology, Ministry of Industry and Information Technology, School of Architecture, Harbin Institute of Technology, 150001, 66 West Dazhi Street, Nan Gang District, Harbin, China.
² Key Laboratory of Cold Region Urban and Rural Human Settlement Environment Science and Technology, Ministry of Industry and Information Technology, School of Architecture, Harbin Institute of Technology, 150001, 66 West Dazhi Street, Nan Gang District, Harbin, China; UCL Institute for Environmental Design and Engineering, University College London (UCL), London WC1H 0NN, UK. Electronic address: j.kang@ucl.ac.uk.
³ Key Laboratory of Cold Region Urban and Rural Human Settlement Environment Science and Technology, Ministry of Industry and Information Technology, School of Architecture, Harbin Institute of Technology, 150001, 66 West Dazhi Street, Nan Gang District, Harbin, China. Electronic address: wuyuehit@hit.edu.cn.

PMID: 31780160
DOI: 10.1016/j.scitotenv.2019.135484

Abstract

Sound perception studies mostly depend on questionnaires with fixed indicators. Therefore, it is desirable to explore methods with dynamic outputs. The present study aims to explore the effects of sound perception in the urban environment on facial expressions using a software named FaceReader based on facial expression recognition (FER). The experiment involved three typical urban sound recordings, namely, traffic noise, natural sound, and community sound. A questionnaire on the evaluation of sound perception was also used, for comparison. The results show that, first, FER is an effective tool for sound perception research, since it is capable of detecting differences in participants' reactions to different sounds and how their facial expressions change over time in response to those sounds, with mean difference of valence between recordings from 0.019 to 0.059 (p < 0.05or p < 0.01). In a natural sound environment, for example, facial expression increased by 0.04 in the first 15 s and then went down steadily at 0.004 every 20 s. Second, the expression indices, namely, happy, sad, and surprised, change significantly under the effect of sound perception. In the traffic sound environment, for example, happy decreased by 0.012, sad increased by 0.032, and surprised decreased by 0.018. Furthermore, social characteristics such as distance from living place to natural environment (r = 0.313), inclination to communicate (r = 0.253), and preference for crowd (r = 0.296) have effects on facial expression. Finally, the comparison of FER and questionnaire survey results showed that in the traffic noise recording, valence in the first 20 s best represents acoustic comfort and eventfulness; for natural sound, valence in the first 40 s best represents pleasantness; and for community sound, valence in the first 20 s of the recording best represents acoustic comfort, subjective loudness, and calmness.

Keywords: FaceReader; Facial expression recognition; Sound perception; Urban soundscape.

MeSH terms

Acoustics
Auditory Perception
Facial Expression*
Humans
Noise
Recognition, Psychology
Sound*