Deep-learning-based in-field citrus fruit detection and tracking

Hortic Res. 2022 Feb 11:9:uhac003. doi: 10.1093/hr/uhac003. Online ahead of print.

Abstract

Fruit yield estimation is crucial to establish fruit harvesting and marketing strategies. Recently, computer vision and deep learning techniques have been used to estimate citrus fruit yield and have exhibited a notable fruit detection ability. However, computer-vision-based citrus fruit counting has two key limitations: inconsistent fruit detection accuracy and double-counting of the same fruit. Using oranges as the experimental material, this paper proposes a deep-learning-based orange counting algorithm using video sequences to help overcome these problems. The algorithm consists of two sub-algorithms, OrangeYolo for fruit detection and OrangeSort for fruit tracking. The OrangeYolo backbone network is partially based on the YOLOv3 algorithm and improved upon to detect small object fruits at multiple scales. The network structure was adjusted to detect small-scale targets while enabling multiscale target detection. A channel attention and spatial attention multiscale fusion module was introduced to fuse the semantic features of the deep network with the shallow textural detail features. OrangeYolo can reach mean Average Precision (mAP) to 0.957 in the citrus dataset, which is higher than the 0.905, 0.911, and 0.917 that the YOLOv3, YOLOv4 and YOLOv5 algorithms. OrangeSort was designed to alleviate the double-counting problem of occluded fruits. A specific tracking region counting strategy and tracking algorithm based on motion displacement estimation are established. Six video sequences, which were taken from two fields containing 22 trees, were used as a validation dataset. The proposed method showed better performance (Mean Absolute Error(MAE) = 0.081, Standard Deviation(SD) = 0.08) compared to video-based manual counting and demonstrated more accurate results compared with existing standard Sort and DeepSort (MAE = 0.45, 1.212; SD = 0.4741, 1.3975; respectively).