Attention-Guided Instance Segmentation for Group-Raised Pigs

Animals (Basel). 2023 Jul 3;13(13):2181. doi: 10.3390/ani13132181.

Abstract

In the pig farming environment, complex factors such as pig adhesion, occlusion, and changes in body posture pose significant challenges for segmenting multiple target pigs. To address these challenges, this study collected video data using a horizontal angle of view and a non-fixed lens. Specifically, a total of 45 pigs aged 20-105 days in 8 pens were selected as research subjects, resulting in 1917 labeled images. These images were divided into 959 for training, 192 for validation, and 766 for testing. The grouped attention module was employed in the feature pyramid network to fuse the feature maps from deep and shallow layers. The grouped attention module consists of a channel attention branch and a spatial attention branch. The channel attention branch effectively models dependencies between channels to enhance feature mapping between related channels and improve semantic feature representation. The spatial attention branch establishes pixel-level dependencies by applying the response values of all pixels in a single-channel feature map to the target pixel. It further guides the original feature map to filter spatial location information and generate context-related outputs. The grouped attention, along with data augmentation strategies, was incorporated into the Mask R-CNN and Cascade Mask R-CNN task networks to explore their impact on pig segmentation. The experiments showed that introducing data augmentation strategies improved the segmentation performance of the model to a certain extent. Taking Mask-RCNN as an example, under the same experimental conditions, the introduction of data augmentation strategies resulted in improvements of 1.5%, 0.7%, 0.4%, and 0.5% in metrics AP50, AP75, APL, and AP, respectively. Furthermore, our grouped attention module achieved the best performance. For example, compared to the existing attention module CBAM, taking Mask R-CNN as an example, in terms of the metric AP50, AP75, APL, and AP, the grouped attention outperformed 1.0%, 0.3%, 1.1%, and 1.2%, respectively. We further studied the impact of the number of groups in the grouped attention on the final segmentation results. Additionally, visualizations of predictions on third-party data collected using a top-down data acquisition method, which was not involved in the model training, demonstrated that the proposed model in this paper still achieved good segmentation results, proving the transferability and robustness of the grouped attention. Through comprehensive analysis, we found that grouped attention is beneficial for achieving high-precision segmentation of individual pigs in different scenes, ages, and time periods. The research results can provide references for subsequent applications such as pig identification and behavior analysis in mobile settings.

Keywords: attention mechanism; channel attention; feature pyramid network; image segmentation; spatial attention.