HyFormer: Hybrid Transformer and CNN for Pixel-Level Multispectral Image Land Cover Classification

Int J Environ Res Public Health. 2023 Feb 9;20(4):3059. doi: 10.3390/ijerph20043059.

Abstract

To effectively solve the problems that most convolutional neural networks cannot be applied to the pixelwise input in remote sensing (RS) classification and cannot adequately represent the spectral sequence information, we propose a new multispectral RS image classification framework called HyFormer based on Transformer. First, a network framework combining a fully connected layer (FC) and convolutional neural network (CNN) is designed, and the 1D pixelwise spectral sequences obtained from the fully connected layers are reshaped into a 3D spectral feature matrix for the input of CNN, which enhances the dimensionality of the features through FC as well as increasing the feature expressiveness, and can solve the problem that 2D CNN cannot achieve pixel-level classification. Secondly, the features of the three levels of CNN are extracted and combined with the linearly transformed spectral information to enhance the information expression capability, and also used as the input of the transformer encoder to improve the features of CNN using the powerful global modelling capability of the Transformer, and finally the skip connection of the adjacent encoders to enhance the fusion between different levels of information. The pixel classification results are obtained by MLP Head. In this paper, we mainly focus on the feature distribution in the eastern part of Changxing County and the central part of Nanxun District, Zhejiang Province, and conduct experiments based on Sentinel-2 multispectral RS images. The experimental results show that the overall accuracy of HyFormer for the study area classification in Changxing County is 95.37% and that of Transformer (ViT) is 94.15%. The experimental results show that the overall accuracy of HyFormer for the study area classification in Nanxun District is 95.4% and that of Transformer (ViT) is 94.69%, and the performance of HyFormer on the Sentinel-2 dataset is better than that of the Transformer.

Keywords: CNN; multispectral RS image classification; pixelwise classification; transformer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electric Power Supplies*
  • Neural Networks, Computer*
  • Telemetry

Grants and funding

This document is the results of the research project funded by the National Natural Science Foundation of China: 62261004 and 62001129.