Fusing attention mechanism with Mask R-CNN for instance segmentation of grape cluster in the field

Front Plant Sci. 2022 Jul 22:13:934450. doi: 10.3389/fpls.2022.934450. eCollection 2022.

Abstract

Accurately detecting and segmenting grape cluster in the field is fundamental for precision viticulture. In this paper, a new backbone network, ResNet50-FPN-ED, was proposed to improve Mask R-CNN instance segmentation so that the detection and segmentation performance can be improved under complex environments, cluster shape variations, leaf shading, trunk occlusion, and grapes overlapping. An Efficient Channel Attention (ECA) mechanism was first introduced in the backbone network to correct the extracted features for better grape cluster detection. To obtain detailed feature map information, Dense Upsampling Convolution (DUC) was used in feature pyramid fusion to improve model segmentation accuracy. Moreover, model generalization performance was also improved by training the model on two different datasets. The developed algorithm was validated on a large dataset with 682 annotated images, where the experimental results indicate that the model achieves an Average Precision (AP) of 60.1% on object detection and 59.5% on instance segmentation. Particularly, on object detection task, the AP improved by 1.4% and 1.8% over the original Mask R-CNN (ResNet50-FPN) and Faster R-CNN (ResNet50-FPN). For the instance segmentation, the AP improved by 1.6% and 2.2% over the original Mask R-CNN and SOLOv2. When tested on different datasets, the improved model had high detection and segmentation accuracy and inter-varietal generalization performance in complex growth environments, which is able to provide technical support for intelligent vineyard management.

Keywords: Dense Upsampling Convolution; Mask R-CNN; attention mechanism; grape; instance segmentation.