Patterns of saliency and semantic features distinguish gaze of expert and novice viewers of surveillance footage

Psychon Bull Rev. 2024 Jan 25. doi: 10.3758/s13423-024-02454-y. Online ahead of print.

Abstract

When viewing the actions of others, we not only see patterns of body movements, but we also "see" the intentions and social relations of people. Experienced forensic examiners - Closed Circuit Television (CCTV) operators - have been shown to convey superior performance in identifying and predicting hostile intentions from surveillance footage than novices. However, it remains largely unknown what visual content CCTV operators actively attend to, and whether CCTV operators develop different strategies for active information seeking from what novices do. Here, we conducted computational analysis for the gaze-centered stimuli captured by experienced CCTV operators and novices' eye movements when viewing the same surveillance footage. Low-level image features were extracted by a visual saliency model, whereas object-level semantic features were extracted by a deep convolutional neural network (DCNN), AlexNet, from gaze-centered regions. We found that the looking behavior of CCTV operators differs from novices by actively attending to visual contents with different patterns of saliency and semantic features. Expertise in selectively utilizing informative features at different levels of visual hierarchy may play an important role in facilitating the efficient detection of social relationships between agents and the prediction of harmful intentions.

Keywords: Deep convolutional neural network (DCNN); Eye movements; Intention; Saliency; Social interaction; Visual expertise.