SLMFNet: Enhancing land cover classification of remote sensing images through selective attentions and multi-level feature fusion

PLoS One. 2024 May 14;19(5):e0301134. doi: 10.1371/journal.pone.0301134. eCollection 2024.

Abstract

Land cover classification (LCC) is of paramount importance for assessing environmental changes in remote sensing images (RSIs) as it involves assigning categorical labels to ground objects. The growing availability of multi-source RSIs presents an opportunity for intelligent LCC through semantic segmentation, offering a comprehensive understanding of ground objects. Nonetheless, the heterogeneous appearances of terrains and objects contribute to significant intra-class variance and inter-class similarity at various scales, adding complexity to this task. In response, we introduce SLMFNet, an innovative encoder-decoder segmentation network that adeptly addresses this challenge. To mitigate the sparse and imbalanced distribution of RSIs, we incorporate selective attention modules (SAMs) aimed at enhancing the distinguishability of learned representations by integrating contextual affinities within spatial and channel domains through a compact number of matrix operations. Precisely, the selective position attention module (SPAM) employs spatial pyramid pooling (SPP) to resample feature anchors and compute contextual affinities. In tandem, the selective channel attention module (SCAM) concentrates on capturing channel-wise affinity. Initially, feature maps are aggregated into fewer channels, followed by the generation of pairwise channel attention maps between the aggregated channels and all channels. To harness fine-grained details across multiple scales, we introduce a multi-level feature fusion decoder with data-dependent upsampling (MLFD) to meticulously recover and merge feature maps at diverse scales using a trainable projection matrix. Empirical results on the ISPRS Potsdam and DeepGlobe datasets underscore the superior performance of SLMFNet compared to various state-of-the-art methods. Ablation studies affirm the efficacy and precision of SAMs in the proposed model.

MeSH terms

  • Algorithms
  • Image Processing, Computer-Assisted / methods
  • Neural Networks, Computer
  • Remote Sensing Technology* / methods

Grants and funding

This research was partially supported by the Special Funds for Basic Research Operating Expenses of Central-level Public Welfare Research Institutes (Grant No. HKY-JBYW-2023-20), the Excellent Post-doctoral Program of Jiangsu Province (Grant No. 2022ZB166), the Fundamental Research Funds for the Central Universities under Grant No. B230201007, and the National Natural Science Foundation of China (Grant No. 42104033 and 42101343). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.