Image Co-Skeletonization via Co-Segmentation

IEEE Trans Image Process. 2021:30:2784-2797. doi: 10.1109/TIP.2021.3054464. Epub 2021 Feb 12.

Abstract

Recent advances in the joint processing of a set of images have shown its advantages over individual processing. Unlike the existing works geared towards co-segmentation or co-localization, in this article, we explore a new joint processing topic: image co-skeletonization, which is defined as joint skeleton extraction of the foreground objects in an image collection. It is well known that object skeletonization in a single natural image is challenging, because there is hardly any prior knowledge available about the object present in the image. Therefore, we resort to the idea of image co-skeletonization, hoping that the commonness prior that exists across the semantically similar images can be leveraged to have such knowledge, similar to other joint processing problems such as co-segmentation. Moreover, earlier research has found that augmenting a skeletonization process with the object's shape information is highly beneficial in capturing the image context. Having made these two observations, we propose a coupled framework for co-skeletonization and co-segmentation tasks to facilitate shape information discovery for our co-skeletonization process through the co-segmentation process. While image co-skeletonization is our primary goal, the co-segmentation process might also benefit, in turn, from exploiting skeleton outputs of the co-skeletonization process as central object seeds through such a coupled framework. As a result, both can benefit from each other synergistically. For evaluating image co-skeletonization results, we also construct a novel benchmark dataset by annotating nearly 1.8 K images and dividing them into 38 semantic categories. Although the proposed idea is essentially a weakly supervised method, it can also be employed in supervised and unsupervised scenarios. Extensive experiments demonstrate that the proposed method achieves promising results in all three scenarios.