One-Shot Learning with Pseudo-Labeling for Cattle Video Segmentation in Smart Livestock Farming

Yongliang Qiao; Tengfei Xue; He Kong; Cameron Clark; Sabrina Lomax; Khalid Rafique; Salah Sukkarieh

doi:10.3390/ani12050558

One-Shot Learning with Pseudo-Labeling for Cattle Video Segmentation in Smart Livestock Farming

Animals (Basel). 2022 Feb 23;12(5):558. doi: 10.3390/ani12050558.

Authors

Yongliang Qiao¹, Tengfei Xue², He Kong³, Cameron Clark⁴, Sabrina Lomax⁴, Khalid Rafique¹, Salah Sukkarieh¹

Affiliations

¹ Australian Centre for Field Robotics (ACFR), Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia.
² School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia.
³ Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen 518055, China.
⁴ Livestock Production and Welfare Group, School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia.

Abstract

Computer vision-based technologies play a key role in precision livestock farming, and video-based analysis approaches have been advocated as useful tools for automatic animal monitoring, behavior analysis, and efficient welfare measurement management. Accurately and efficiently segmenting animals' contours from their backgrounds is a prerequisite for vision-based technologies. Deep learning-based segmentation methods have shown good performance through training models on a large amount of pixel-labeled images. However, it is challenging and time-consuming to label animal images due to their irregular contours and changing postures. In order to reduce the reliance on the number of labeled images, one-shot learning with a pseudo-labeling approach is proposed using only one labeled image frame to segment animals in videos. The proposed approach is mainly comprised of an Xception-based Fully Convolutional Neural Network (Xception-FCN) module and a pseudo-labeling (PL) module. Xception-FCN utilizes depth-wise separable convolutions to learn different-level visual features and localize dense prediction based on the one single labeled frame. Then, PL leverages the segmentation results of the Xception-FCN model to fine-tune the model, leading to performance boosts in cattle video segmentation. Systematic experiments were conducted on a challenging feedlot cattle video dataset acquired by the authors, and the proposed approach achieved a mean intersection-over-union score of 88.7% and a contour accuracy of 80.8%, outperforming state-of-the-art methods (OSVOS and OSMN). Our proposed one-shot learning approach could serve as an enabling component for livestock farming-related segmentation and detection applications.

Keywords: deep learning; one-shot learning; precision livestock farming; pseudo-labeling; video segmentation.

Grants and funding

P.PSH.0819/Meat & Livestock Australia Donor Company