Memorizing Structure-Texture Correspondence for Image Anomaly Detection

Kang Zhou; Jing Li; Yuting Xiao; Jianlong Yang; Jun Cheng; Wen Liu; Weixin Luo; Jiang Liu; Shenghua Gao

doi:10.1109/TNNLS.2021.3101403

Memorizing Structure-Texture Correspondence for Image Anomaly Detection

IEEE Trans Neural Netw Learn Syst. 2022 Jun;33(6):2335-2349. doi: 10.1109/TNNLS.2021.3101403. Epub 2022 Jun 1.

Authors

Kang Zhou, Jing Li, Yuting Xiao, Jianlong Yang, Jun Cheng, Wen Liu, Weixin Luo, Jiang Liu, Shenghua Gao

PMID: 34388096
DOI: 10.1109/TNNLS.2021.3101403

Abstract

This work focuses on image anomaly detection by leveraging only normal images in the training phase. Most previous methods tackle anomaly detection by reconstructing the input images with an autoencoder (AE)-based model, and an underlying assumption is that the reconstruction errors for the normal images are small, and those for the abnormal images are large. However, these AE-based methods, sometimes, even reconstruct the anomalies well; consequently, they are less sensitive to anomalies. To conquer this issue, we propose to reconstruct the image by leveraging the structure-texture correspondence. Specifically, we observe that, usually, for normal images, the texture can be inferred from its corresponding structure (e.g., the blood vessels in the fundus image and the structured anatomy in optical coherence tomography image), while it is hard to infer the texture from a destroyed structure for the abnormal images. Therefore, a structure-texture correspondence memory (STCM) module is proposed to reconstruct image texture from its structure, where a memory mechanism is used to characterize the mapping from the normal structure to its corresponding normal texture. As the correspondence between destroyed structure and texture cannot be characterized by the memory, the abnormal images would have a larger reconstruction error, facilitating anomaly detection. In this work, we utilize two kinds of complementary structures (i.e., the semantic structure with human-labeled category information and the low-level structure with abundant details), which are extracted by two structure extractors. The reconstructions from the two kinds of structures are fused together by a learned attention weight to get the final reconstructed image. We further feed the reconstructed image into the two aforementioned structure extractors to extract structures. On the one hand, constraining the consistency between the structures extracted from the original input and that from the reconstructed image would regularize the network training; on the other hand, the error between the structures extracted from the original input and that from the reconstructed image can also be used as a supplement measurement to identify the anomaly. Extensive experiments validate the effectiveness of our method for image anomaly detection on both industrial inspection images and medical images.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Image Processing, Computer-Assisted* / methods
Neural Networks, Computer*