SCFormer: Spectral Coordinate Transformer for Cross-Domain Few-Shot Hyperspectral Image Classification

IEEE Trans Image Process. 2024:33:840-855. doi: 10.1109/TIP.2024.3351443. Epub 2024 Jan 19.

Abstract

Cross-domain (CD) hyperspectral image classification (HSIC) has been significantly boosted by methods employing Few-Shot Learning (FSL) based on CNNs or GCNs. Nevertheless, the majority of current approaches disregard the prior information of spectral coordinates with limited interpretability, leading to inadequate robustness and knowledge transfer. In this paper, we propose an asymmetric encoder-decoder architecture, Spectral Coordinate Transformer (SCFormer), for the CDFSL HSIC task. Several dense Spectral Coordinate blocks (SC blocks) are embedded in the backbone of the encoder to establish feature representation with better generalization, which integrates spectral coordinates via Rotary Position Embedding (RoPE) to minimize spectral position disturbance caused by the convolution operation. Due to a large amount of hyperspectral image data and the high demand for model generalization ability in cross-domain scenarios, we design two mask patterns (Random Mask and Sequential Mask) built on unexploited spectral coordinates within the SC blocks, which are unified with the asymmetric structure to learn high-capacity models efficiently and effectively with satisfactory generalization. Besides, from the perspective of the loss function, we devise an intra-domain loss function founded on the Orthogonal Complement Space Projection (OCSP) theory to facilitate the aggregation of samples in the metric space, which promotes intra-domain consistency and increases interpretability. Finally, the strengthened class expression capacity of the intra-domain loss function contributes to the inter-domain loss function constructed by Wasserstein Distance (WD) for realizing domain alignment. Experimental results on four benchmark data sets demonstrate the superiority of the SCFormer.