Unsupervised Video Matting via Sparse and Low-Rank Representation

IEEE Trans Pattern Anal Mach Intell. 2020 Jun;42(6):1501-1514. doi: 10.1109/TPAMI.2019.2895331. Epub 2019 Jan 25.

Abstract

A novel method, unsupervised video matting via sparse and low-rank representation, is proposed which can achieve high quality in a variety of challenging examples featuring illumination changes, feature ambiguity, topology changes, transparency variation, dis-occlusion, fast motion and motion blur. Some previous matting methods introduced a nonlocal prior to search samples for estimating the alpha matte, which have achieved impressive results on some data. However, on one hand, searching inadequate or excessive samples may miss good samples or introduce noise; on the other hand, it is difficult to construct consistent nonlocal structures for pixels with similar features, yielding video mattes with spatial and temporal inconsistency. In this paper, we proposed a novel video matting method to achieve spatially and temporally consistent matting result. Toward this end, a sparse and low-rank representation model is introduced to pursue consistent nonlocal structures for pixels with similar features. The sparse representation is used to adaptively select best samples and accurately construct the nonlocal structures for all pixels, while the low-rank representation is used to globally ensure consistent nonlocal structures for pixels with similar features. The two representations are combined to generate spatially and temporally consistent video mattes. We test our method on lots of dataset including the benchmark dataset for image matting and dataset for video matting. Our method has achieved the best performance among all unsupervised matting methods in the public alpha matting evaluation dataset for images.