Patterns of saliency and semantic features distinguish gaze of expert and novice viewers of surveillance footage

Yujia Peng; Joseph M Burling; Greta K Todorova; Catherine Neary; Frank E Pollick; Hongjing Lu

doi:10.3758/s13423-024-02454-y

Patterns of saliency and semantic features distinguish gaze of expert and novice viewers of surveillance footage

Psychon Bull Rev. 2024 Jan 25. doi: 10.3758/s13423-024-02454-y. Online ahead of print.

Authors

Yujia Peng^{1

2

3

4}, Joseph M Burling⁵, Greta K Todorova⁶, Catherine Neary⁷, Frank E Pollick⁶, Hongjing Lu^{5

8}

Affiliations

¹ School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, 100871, China. yujia_peng@pku.edu.cn.
² Institute for Artificial Intelligence, Peking University, Beijing, China. yujia_peng@pku.edu.cn.
³ National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence, Beijing, China. yujia_peng@pku.edu.cn.
⁴ Department of Psychology, University of California, Los Angeles, CA, USA. yujia_peng@pku.edu.cn.
⁵ Department of Psychology, University of California, Los Angeles, CA, USA.
⁶ School of Psychology and Neuroscience, University of Glasgow, Glasgow, UK.
⁷ School of Health and Social Wellbeing, The University of the West of England, Bristol, UK.
⁸ Department of Statistics, University of California, Los Angeles, CA, USA.

PMID: 38273144
DOI: 10.3758/s13423-024-02454-y

Abstract

When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people. Experienced forensic examiners - Closed Circuit Television (CCTV) operators - have been shown to convey superior performance in identifying and predicting hostile intentions from surveillance footage than novices. However, it remains largely unknown what visual content CCTV operators actively attend to, and whether CCTV operators develop different strategies for active information seeking from what novices do. Here, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when viewing the same surveillance footage. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that the looking behavior of CCTV operators differs from novices by actively attending to visual contents with different patterns of saliency and semantic features. Expertise in selectively utilizing informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.

Keywords: Deep convolutional neural network (DCNN); Eye movements; Intention; Saliency; Social interaction; Visual expertise.