Small object detection algorithm incorporating swin transformer for tea buds

Meiling Shi; Dongling Zheng; Tianhao Wu; Wenjing Zhang; Ruijie Fu; Kailiang Huang

doi:10.1371/journal.pone.0299902

Small object detection algorithm incorporating swin transformer for tea buds

PLoS One. 2024 Mar 21;19(3):e0299902. doi: 10.1371/journal.pone.0299902. eCollection 2024.

Authors

Meiling Shi¹, Dongling Zheng¹, Tianhao Wu², Wenjing Zhang¹, Ruijie Fu¹, Kailiang Huang²

Affiliations

¹ College of Information Engineering, Sichuan Agricultural University, Ya'an, China.
² College of Electrical Engineering, Sichuan Agricultural University, Ya'an, China.

Abstract

Accurate identification of small tea buds is a key technology for tea harvesting robots, which directly affects tea quality and yield. However, due to the complexity of the tea plantation environment and the diversity of tea buds, accurate identification remains an enormous challenge. Current methods based on traditional image processing and machine learning fail to effectively extract subtle features and morphology of small tea buds, resulting in low accuracy and robustness. To achieve accurate identification, this paper proposes a small object detection algorithm called STF-YOLO (Small Target Detection with Swin Transformer and Focused YOLO), which integrates the Swin Transformer module and the YOLOv8 network to improve the detection ability of small objects. The Swin Transformer module extracts visual features based on a self-attention mechanism, which captures global and local context information of small objects to enhance feature representation. The YOLOv8 network is an object detector based on deep convolutional neural networks, offering high speed and precision. Based on the YOLOv8 network, modules including Focus and Depthwise Convolution are introduced to reduce computation and parameters, increase receptive field and feature channels, and improve feature fusion and transmission. Additionally, the Wise Intersection over Union loss is utilized to optimize the network. Experiments conducted on a self-created dataset of tea buds demonstrate that the STF-YOLO model achieves outstanding results, with an accuracy of 91.5% and a mean Average Precision of 89.4%. These results are significantly better than other detectors. Results show that, compared to mainstream algorithms (YOLOv8, YOLOv7, YOLOv5, and YOLOx), the model improves accuracy and F1 score by 5-20.22 percentage points and 0.03-0.13, respectively, proving its effectiveness in enhancing small object detection performance. This research provides technical means for the accurate identification of small tea buds in complex environments and offers insights into small object detection. Future research can further optimize model structures and parameters for more scenarios and tasks, as well as explore data augmentation and model fusion methods to improve generalization ability and robustness.

Copyright: © 2024 Shi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Algorithms*
Electric Power Supplies
Generalization, Psychological
Neural Networks, Computer*
Tea

Substances

Tea

Grants and funding

This work was supported by the Subsidy for University Student Entrepreneurship Training Program (No. 202310626003).