MASK-RL: Multiagent Video Object Segmentation Framework Through Reinforcement Learning

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5103-5115. doi: 10.1109/TNNLS.2019.2963282. Epub 2020 Nov 30.

Abstract

Integrating human-provided location priors into video object segmentation has been shown to be an effective strategy to enhance performance, but their application at large scale is unfeasible. Gamification can help reduce the annotation burden, but it still requires user involvement. We propose a video object segmentation framework that leverages the combined advantages of user feedback for segmentation and gamification strategy by simulating multiple game players through a reinforcement learning (RL) model that reproduces human ability to pinpoint moving objects and using the simulated feedback to drive the decisions of a fully convolutional deep segmentation network. Experimental results on the DAVIS-17 benchmark show that: 1) including user-provided prior, even if not precise, yields high performance; 2) our RL agent replicates satisfactorily the same variability of humans in identifying spatiotemporal salient objects; and 3) employing artificially generated priors in an unsupervised video object segmentation model reaches state-of-the-art performance.