Jiangtao Qi, Xv Cong, Weirong Zhang, Fangfang Gao, Bo Zhao, Hui Guo
{"title":"Rapid Detection of Ripe Tomatoes in Unstructured Environments","authors":"Jiangtao Qi, Xv Cong, Weirong Zhang, Fangfang Gao, Bo Zhao, Hui Guo","doi":"10.1002/rob.22556","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>To achieve efficient detection of ripe tomatoes in unstructured environments, this paper proposed an improved YOLOv7 rapid detection network model for ripe tomatoes. Firstly, the original YOLOv7 backbone network's CSP-Darknet53 structure was replaced by the FasterNet network structure to enhance model detection efficiency and reduce the parameters of the model. Secondly, the Global Attention Mechanism (GAM) was introduced to improve the tomato feature expression ability with a small increase in model parameters. Next, a Diverse Branch Block (DBB) module was integrated into the ELAN module in the head structure to improve the model's inference efficiency. Finally, the batch normalization layer <i>γ</i> was selected as the parameter of the sparsity factor in the algorithm. The L<sub>1</sub> regularization term was used to train the original model for sparsity, and the slim pruning algorithm was used for global channel pruning to compress the model size. The pruned model was retrained through model fine-tuning to adjust the detection accuracy to near the level before pruning. The experimental results show that the improved model has a mean average precision of 96.49%, which is basically unchanged compared to the original model. However, the model parameter count, the computation, and the model size were reduced by 52.16%, 56.84%, and 36.95%, respectively, resulting in a 32.09% increase in the recognition frame rate. Compared to similar object detection models, such as SSD, YOLOv3, YOLOv4, YOLOv5s, YOLOX, and YOLOv8, the Improved-YOLOv7 model reduced the parameter by 4.44% to 89.05%, computational complexity by 30.37% to 91.18%, and model size by 26.43% to 72.16%. This paper provided technical support for the recognition of ripe tomatoes in unstructured environments.</p>\n </div>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 6","pages":"2920-2935"},"PeriodicalIF":5.2000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22556","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
To achieve efficient detection of ripe tomatoes in unstructured environments, this paper proposed an improved YOLOv7 rapid detection network model for ripe tomatoes. Firstly, the original YOLOv7 backbone network's CSP-Darknet53 structure was replaced by the FasterNet network structure to enhance model detection efficiency and reduce the parameters of the model. Secondly, the Global Attention Mechanism (GAM) was introduced to improve the tomato feature expression ability with a small increase in model parameters. Next, a Diverse Branch Block (DBB) module was integrated into the ELAN module in the head structure to improve the model's inference efficiency. Finally, the batch normalization layer γ was selected as the parameter of the sparsity factor in the algorithm. The L1 regularization term was used to train the original model for sparsity, and the slim pruning algorithm was used for global channel pruning to compress the model size. The pruned model was retrained through model fine-tuning to adjust the detection accuracy to near the level before pruning. The experimental results show that the improved model has a mean average precision of 96.49%, which is basically unchanged compared to the original model. However, the model parameter count, the computation, and the model size were reduced by 52.16%, 56.84%, and 36.95%, respectively, resulting in a 32.09% increase in the recognition frame rate. Compared to similar object detection models, such as SSD, YOLOv3, YOLOv4, YOLOv5s, YOLOX, and YOLOv8, the Improved-YOLOv7 model reduced the parameter by 4.44% to 89.05%, computational complexity by 30.37% to 91.18%, and model size by 26.43% to 72.16%. This paper provided technical support for the recognition of ripe tomatoes in unstructured environments.
期刊介绍:
The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments.
The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.