Social Image Captioning: Exploring Visual Attention and User Attention

Leiquan Wang; Xiaoliang Chu; Weishan Zhang; Yiwei Wei; Weichen Sun; Chunlei Wu

doi:10.3390/s18020646

Social Image Captioning: Exploring Visual Attention and User Attention

Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.

Authors

Leiquan Wang¹, Xiaoliang Chu², Weishan Zhang³, Yiwei Wei⁴, Weichen Sun^{5

6}, Chunlei Wu⁷

Affiliations

¹ College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China. richiewlq@gmail.com.
² College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China. s16070788@s.upc.edu.cn.
³ College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China. zhanws@upc.edu.cn.
⁴ College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China. z16070538@s.upc.edu.cn.
⁵ First Research Institute of the Ministry of Public Security of PRC, Beijing 100048, China. weichen.sun@hotmail.com.
⁶ School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China. weichen.sun@hotmail.com.
⁷ College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China. wuchunlei@upc.edu.cn.

Abstract

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image captioning. In this work, a dual attention model is proposed for social image captioning by combining the visual attention and user attention simultaneously.Visual attention is used to compress a large mount of salient visual information, while user attention is applied to adjust the description of the social images with user-contributed tags. Experiments conducted on the Microsoft (MS) COCO dataset demonstrate the superiority of the proposed method of dual attention.

Keywords: social image captioning; user attention; user-contributed tags; visual attention.

MeSH terms

Attention*
Humans
Language
Visual Perception