An occluded cherry tomato recognition model based on improved YOLOv7

Guangyu Hou; Haihua Chen; Yike Ma; Mingkun Jiang; Chen Hua; Chunmao Jiang; Runxin Niu

doi:10.3389/fpls.2023.1260808

An occluded cherry tomato recognition model based on improved YOLOv7

Front Plant Sci. 2023 Oct 20:14:1260808. doi: 10.3389/fpls.2023.1260808. eCollection 2023.

Authors

Guangyu Hou^{1

2}, Haihua Chen³, Yike Ma³, Mingkun Jiang^{1

2}, Chen Hua^{1

2}, Chunmao Jiang^{1

2}, Runxin Niu¹

Affiliations

¹ Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.
² Science Island Branch, University of Science and Technology of China Country, Hefei, China.
³ Institute of Computer Science, Chinese Academy of Sciences, Beijing, China.

Abstract

The typical occlusion of cherry tomatoes in the natural environment is one of the most critical factors affecting the accurate picking of cherry tomato picking robots. To recognize occluded cherry tomatoes accurately and efficiently using deep convolutional neural networks, a new occluded cherry tomato recognition model DSP-YOLOv7-CA is proposed. Firstly, images of cherry tomatoes with different degrees of occlusion are acquired, four occlusion areas and four occlusion methods are defined, and a cherry tomato dataset (TOSL) is constructed. Then, based on YOLOv7, the convolution module of the original residual edges was replaced with null residual edges, depth-separable convolutional layers were added, and jump connections were added to reuse feature information. Then, a depth-separable convolutional layer is added to the SPPF module with fewer parameters to replace the original SPPCSPC module to solve the problem of loss of small target information by different pooled residual layers. Finally, a coordinate attention mechanism (CA) layer is introduced at the critical position of the enhanced feature extraction network to strengthen the attention to the occluded cherry tomato. The experimental results show that the DSP-YOLOv7-CA model outperforms other target detection models, with an average detection accuracy (mAP) of 98.86%, and the number of model parameters is reduced from 37.62MB to 33.71MB, which is better on the actual detection of cherry tomatoes with less than 95% occlusion. Relatively average results were obtained on detecting cherry tomatoes with a shade level higher than 95%, but such cherry tomatoes were not targeted for picking. The DSP-YOLOv7-CA model can accurately recognize the occluded cherry tomatoes in the natural environment, providing an effective solution for accurately picking cherry tomato picking robots.

Keywords: DSP-YOLOv7-CA; cherry tomato picking robot; coordinate attention mechanism; depth separable convolution; object detection; residual module.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the following funds: Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDA28120000 and XDA28040000; The Subproject of The National Key R & D Program, Grant No. 2022YFD2001404-01; Natural Science Foundation of Shandong Province, Grant No. ZR2021MF094 and ZR2020KF030; Key R & D Plan of Shandong Province, Grant No. 2020CXGC010804; Central Leading Local Science and Technology Development Special Fund Project, Grant No. YDZX2021122; Science & Technology Specific Projects in Agricultural High-tech Industrial Demonstration Area of the Yellow River Delta, Grant No. 2022SZX11.