Privacy-Preserving Semantic Segmentation Using Vision Transformer

Hitoshi Kiya; Teru Nagamori; Shoko Imaizumi; Sayaka Shiota

doi:10.3390/jimaging8090233

Privacy-Preserving Semantic Segmentation Using Vision Transformer

J Imaging. 2022 Aug 30;8(9):233. doi: 10.3390/jimaging8090233.

Authors

Hitoshi Kiya¹, Teru Nagamori¹, Shoko Imaizumi², Sayaka Shiota¹

Affiliations

¹ Department of Computer Science, Tokyo Metropolitan University, 6-6 Asahigaoka, Hino-shi, Tokyo 191-0065, Japan.
² Graduate School of Engineering, Chiba University, 1-33 Yayoicho, Chiba 263-8522, Japan.

Abstract

In this paper, we propose a privacy-preserving semantic segmentation method that uses encrypted images and models with the vision transformer (ViT), called the segmentation transformer (SETR). The combined use of encrypted images and SETR allows us not only to apply images without sensitive visual information to SETR as query images but to also maintain the same accuracy as that of using plain images. Previously, privacy-preserving methods with encrypted images for deep neural networks have focused on image classification tasks. In addition, the conventional methods result in a lower accuracy than models trained with plain images due to the influence of image encryption. To overcome these issues, a novel method for privacy-preserving semantic segmentation is proposed by using an embedding that the ViT structure has for the first time. In experiments, the proposed privacy-preserving semantic segmentation was demonstrated to have the same accuracy as that of using plain images under the use of encrypted images.

Keywords: privacy-preserving; segmentation transformer; semantic segmentation; vision transformer.

Abstract

Grants and funding