Self-Teaching Video Object Segmentation

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1623-1637. doi: 10.1109/TNNLS.2020.3043099. Epub 2022 Apr 4.

Abstract

Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.