Deep-agriNet: a lightweight attention-based encoder-decoder framework for crop identification using multispectral images

Yimin Hu; Ao Meng; Yanjun Wu; Le Zou; Zhou Jin; Taosheng Xu

doi:10.3389/fpls.2023.1124939

Deep-agriNet: a lightweight attention-based encoder-decoder framework for crop identification using multispectral images

Front Plant Sci. 2023 Apr 18:14:1124939. doi: 10.3389/fpls.2023.1124939. eCollection 2023.

Authors

Yimin Hu^{1

2}, Ao Meng¹, Yanjun Wu^{2

3}, Le Zou¹, Zhou Jin², Taosheng Xu²

Affiliations

¹ School of Big Data And Artificial Intelligence, Hefei University, Hefei, China.
² Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Science, Hefei, China.
³ Science Island Branch, University of Science and Technology of China, Hefei, China.

Abstract

The field of computer vision has shown great potential for the identification of crops at large scales based on multispectral images. However, the challenge in designing crop identification networks lies in striking a balance between accuracy and a lightweight framework. Furthermore, there is a lack of accurate recognition methods for non-large-scale crops. In this paper, we propose an improved encoder-decoder framework based on DeepLab v3+ to accurately identify crops with different planting patterns. The network employs ShuffleNet v2 as the backbone to extract features at multiple levels. The decoder module integrates a convolutional block attention mechanism that combines both channel and spatial attention mechanisms to fuse attention features across the channel and spatial dimensions. We establish two datasets, DS1 and DS2, where DS1 is obtained from areas with large-scale crop planting, and DS2 is obtained from areas with scattered crop planting. On DS1, the improved network achieves a mean intersection over union (mIoU) of 0.972, overall accuracy (OA) of 0.981, and recall of 0.980, indicating a significant improvement of 7.0%, 5.0%, and 5.7%, respectively, compared to the original DeepLab v3+. On DS2, the improved network improves the mIoU, OA, and recall by 5.4%, 3.9%, and 4.4%, respectively. Notably, the number of parameters and giga floating-point operations (GFLOPs) required by the proposed Deep-agriNet is significantly smaller than that of DeepLab v3+ and other classic networks. Our findings demonstrate that Deep-agriNet performs better in identifying crops with different planting scales, and can serve as an effective tool for crop identification in various regions and countries.

Keywords: DeepLab v3+; crop identification; encoder-decoder; feature extraction; lightweight; multispectral image.

Grants and funding

This work was supported by the National Key Research and Development Program of China [2021YFD2000205].