Advancing capsicum detection in night-time greenhouse environments using deep learning models: Comparative analysis and improved zero-shot detection through fusion with a single-shot detector

Franklin Open Pub Date : 2025-03-01 DOI:10.1016/j.fraope.2025.100243

Ayan Paul, Rajendra Machavaram

{"title":"Advancing capsicum detection in night-time greenhouse environments using deep learning models: Comparative analysis and improved zero-shot detection through fusion with a single-shot detector","authors":"Ayan Paul, Rajendra Machavaram","doi":"10.1016/j.fraope.2025.100243","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses capsicum detection in night-time greenhouse settings using a robust approach. A dataset of 300 images was curated, capturing various shooting distances, heights, occlusions, and lighting intensities, and underwent extensive pre-processing and augmentation. The single-shot custom-trained You Only Look Once version 9 (YOLOv9) model was evaluated, achieving precision, recall, F1 score, and mean Average Precision (mAP) of 0.898, 0.864, 0.881, and 0.947, respectively, with a detection speed of 38.46 frames per second (FPS). Concurrently, the zero-shot Grounding self-DIstillation with NO labels (Grounding DINO) model required no training and was hypertuned for capsicum detection using Google Colaboratory. Utilizing its Open Vocabulary Object Detection (OVOD) capability, the model successfully performed capsicum detection, positional search, growth stage detection, and diseased capsicum detection with confidence scores of 74 %, 43 %, 74 %, and 43 %, respectively. Comparative testing of both models on 100 test images containing 175 capsicums showed that YOLOv9 outperformed Grounding DINO with precision, recall, and F1 scores of 0.88, 0.86, and 0.87, compared to Grounding DINO's 0.72, 0.69, and 0.70. YOLOv9 also demonstrated an inference speed of 26 milliseconds, approximately five times faster than Grounding DINO. The fusion of YOLOv9 and Grounding DINO into You Only Look Once version Open Vocabulary Object Detection (YOLOvOVOD) significantly improved performance, achieving the highest confidence of 88 % for growth stage detection and a 65.11 % increase in confidence for positional search. This integrated approach leverages the strengths of both models, presenting a robust solution for future automation in agricultural machine vision.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"10 ","pages":"Article 100243"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186325000337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study addresses capsicum detection in night-time greenhouse settings using a robust approach. A dataset of 300 images was curated, capturing various shooting distances, heights, occlusions, and lighting intensities, and underwent extensive pre-processing and augmentation. The single-shot custom-trained You Only Look Once version 9 (YOLOv9) model was evaluated, achieving precision, recall, F1 score, and mean Average Precision (mAP) of 0.898, 0.864, 0.881, and 0.947, respectively, with a detection speed of 38.46 frames per second (FPS). Concurrently, the zero-shot Grounding self-DIstillation with NO labels (Grounding DINO) model required no training and was hypertuned for capsicum detection using Google Colaboratory. Utilizing its Open Vocabulary Object Detection (OVOD) capability, the model successfully performed capsicum detection, positional search, growth stage detection, and diseased capsicum detection with confidence scores of 74 %, 43 %, 74 %, and 43 %, respectively. Comparative testing of both models on 100 test images containing 175 capsicums showed that YOLOv9 outperformed Grounding DINO with precision, recall, and F1 scores of 0.88, 0.86, and 0.87, compared to Grounding DINO's 0.72, 0.69, and 0.70. YOLOv9 also demonstrated an inference speed of 26 milliseconds, approximately five times faster than Grounding DINO. The fusion of YOLOv9 and Grounding DINO into You Only Look Once version Open Vocabulary Object Detection (YOLOvOVOD) significantly improved performance, achieving the highest confidence of 88 % for growth stage detection and a 65.11 % increase in confidence for positional search. This integrated approach leverages the strengths of both models, presenting a robust solution for future automation in agricultural machine vision.

查看原文本刊更多论文

利用深度学习模型推进夜间温室环境中的辣椒检测：通过与单发探测器融合的比较分析和改进的零发检测

本研究采用一种稳健的方法，对夜间温室环境中的辣椒进行检测。该数据集包含 300 张图像，捕捉了不同的拍摄距离、高度、遮挡物和光照强度，并进行了大量的预处理和增强处理。经过评估，单次拍摄定制训练的 "你只看一次 "第 9 版（YOLOv9）模型的精确度、召回率、F1 分数和平均精确度（mAP）分别达到 0.898、0.864、0.881 和 0.947，检测速度为每秒 38.46 帧（FPS）。与此同时，零镜头无标签接地自振（Grounding DINO）模型无需训练，并通过谷歌实验室对辣椒检测进行了优化。利用其开放词汇对象检测（OVOD）功能，该模型成功地进行了辣椒检测、位置搜索、生长阶段检测和病辣椒检测，置信度分别为 74%、43%、74% 和 43%。在包含 175 个辣椒的 100 张测试图像上对两个模型进行的比较测试表明，YOLOv9 的精确度、召回率和 F1 分数分别为 0.88、0.86 和 0.87，而 Grounding DINO 的精确度、召回率和 F1 分数分别为 0.72、0.69 和 0.70，YOLOv9 的表现优于 Grounding DINO。YOLOv9 的推理速度也达到了 26 毫秒，大约是 Grounding DINO 的五倍。将 YOLOv9 和 Grounding DINO 融合到 "你只看一次 "开放词汇表对象检测（YOLOvOVOD）版本中，大大提高了性能，在生长阶段检测方面达到了 88% 的最高置信度，在位置搜索方面的置信度提高了 65.11%。这种集成方法充分利用了两种模型的优势，为未来农业机械视觉的自动化提供了一个强大的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Franklin Open

自引率

0.00%

发文量