{"title":"利用深度学习模型推进夜间温室环境中的辣椒检测:通过与单发探测器融合的比较分析和改进的零发检测","authors":"Ayan Paul, Rajendra Machavaram","doi":"10.1016/j.fraope.2025.100243","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses capsicum detection in night-time greenhouse settings using a robust approach. A dataset of 300 images was curated, capturing various shooting distances, heights, occlusions, and lighting intensities, and underwent extensive pre-processing and augmentation. The single-shot custom-trained You Only Look Once version 9 (YOLOv9) model was evaluated, achieving precision, recall, F1 score, and mean Average Precision (mAP) of 0.898, 0.864, 0.881, and 0.947, respectively, with a detection speed of 38.46 frames per second (FPS). Concurrently, the zero-shot Grounding self-DIstillation with NO labels (Grounding DINO) model required no training and was hypertuned for capsicum detection using Google Colaboratory. Utilizing its Open Vocabulary Object Detection (OVOD) capability, the model successfully performed capsicum detection, positional search, growth stage detection, and diseased capsicum detection with confidence scores of 74 %, 43 %, 74 %, and 43 %, respectively. Comparative testing of both models on 100 test images containing 175 capsicums showed that YOLOv9 outperformed Grounding DINO with precision, recall, and F1 scores of 0.88, 0.86, and 0.87, compared to Grounding DINO's 0.72, 0.69, and 0.70. YOLOv9 also demonstrated an inference speed of 26 milliseconds, approximately five times faster than Grounding DINO. The fusion of YOLOv9 and Grounding DINO into You Only Look Once version Open Vocabulary Object Detection (YOLOvOVOD) significantly improved performance, achieving the highest confidence of 88 % for growth stage detection and a 65.11 % increase in confidence for positional search. This integrated approach leverages the strengths of both models, presenting a robust solution for future automation in agricultural machine vision.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"10 ","pages":"Article 100243"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advancing capsicum detection in night-time greenhouse environments using deep learning models: Comparative analysis and improved zero-shot detection through fusion with a single-shot detector\",\"authors\":\"Ayan Paul, Rajendra Machavaram\",\"doi\":\"10.1016/j.fraope.2025.100243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study addresses capsicum detection in night-time greenhouse settings using a robust approach. A dataset of 300 images was curated, capturing various shooting distances, heights, occlusions, and lighting intensities, and underwent extensive pre-processing and augmentation. The single-shot custom-trained You Only Look Once version 9 (YOLOv9) model was evaluated, achieving precision, recall, F1 score, and mean Average Precision (mAP) of 0.898, 0.864, 0.881, and 0.947, respectively, with a detection speed of 38.46 frames per second (FPS). Concurrently, the zero-shot Grounding self-DIstillation with NO labels (Grounding DINO) model required no training and was hypertuned for capsicum detection using Google Colaboratory. Utilizing its Open Vocabulary Object Detection (OVOD) capability, the model successfully performed capsicum detection, positional search, growth stage detection, and diseased capsicum detection with confidence scores of 74 %, 43 %, 74 %, and 43 %, respectively. Comparative testing of both models on 100 test images containing 175 capsicums showed that YOLOv9 outperformed Grounding DINO with precision, recall, and F1 scores of 0.88, 0.86, and 0.87, compared to Grounding DINO's 0.72, 0.69, and 0.70. YOLOv9 also demonstrated an inference speed of 26 milliseconds, approximately five times faster than Grounding DINO. The fusion of YOLOv9 and Grounding DINO into You Only Look Once version Open Vocabulary Object Detection (YOLOvOVOD) significantly improved performance, achieving the highest confidence of 88 % for growth stage detection and a 65.11 % increase in confidence for positional search. This integrated approach leverages the strengths of both models, presenting a robust solution for future automation in agricultural machine vision.</div></div>\",\"PeriodicalId\":100554,\"journal\":{\"name\":\"Franklin Open\",\"volume\":\"10 \",\"pages\":\"Article 100243\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Franklin Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773186325000337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186325000337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Advancing capsicum detection in night-time greenhouse environments using deep learning models: Comparative analysis and improved zero-shot detection through fusion with a single-shot detector
This study addresses capsicum detection in night-time greenhouse settings using a robust approach. A dataset of 300 images was curated, capturing various shooting distances, heights, occlusions, and lighting intensities, and underwent extensive pre-processing and augmentation. The single-shot custom-trained You Only Look Once version 9 (YOLOv9) model was evaluated, achieving precision, recall, F1 score, and mean Average Precision (mAP) of 0.898, 0.864, 0.881, and 0.947, respectively, with a detection speed of 38.46 frames per second (FPS). Concurrently, the zero-shot Grounding self-DIstillation with NO labels (Grounding DINO) model required no training and was hypertuned for capsicum detection using Google Colaboratory. Utilizing its Open Vocabulary Object Detection (OVOD) capability, the model successfully performed capsicum detection, positional search, growth stage detection, and diseased capsicum detection with confidence scores of 74 %, 43 %, 74 %, and 43 %, respectively. Comparative testing of both models on 100 test images containing 175 capsicums showed that YOLOv9 outperformed Grounding DINO with precision, recall, and F1 scores of 0.88, 0.86, and 0.87, compared to Grounding DINO's 0.72, 0.69, and 0.70. YOLOv9 also demonstrated an inference speed of 26 milliseconds, approximately five times faster than Grounding DINO. The fusion of YOLOv9 and Grounding DINO into You Only Look Once version Open Vocabulary Object Detection (YOLOvOVOD) significantly improved performance, achieving the highest confidence of 88 % for growth stage detection and a 65.11 % increase in confidence for positional search. This integrated approach leverages the strengths of both models, presenting a robust solution for future automation in agricultural machine vision.