A novel automated cloud-based image datasets for high throughput phenotyping in weed classification

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief Pub Date : 2024-11-01 DOI:10.1016/j.dib.2024.111097

Sunil G C , Cengiz Koparan , Arjun Upadhyay , Mohammed Raju Ahmed , Yu Zhang , Kirk Howatt , Xin Sun

{"title":"A novel automated cloud-based image datasets for high throughput phenotyping in weed classification","authors":"Sunil G C , Cengiz Koparan , Arjun Upadhyay , Mohammed Raju Ahmed , Yu Zhang , Kirk Howatt , Xin Sun","doi":"10.1016/j.dib.2024.111097","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 111097"},"PeriodicalIF":1.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235234092401059X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.

查看原文本刊更多论文

用于杂草分类中高通量表型分析的新型自动云图像数据集

基于深度学习的杂草检测数据管理涉及数据采集、数据标注、模型开发和模型评估等阶段。在这些数据管理阶段中，数据采集和数据标注是建立稳健模型的劳动密集型耗时步骤。此外，数据集中作物和杂草的时间变化较小，也是限制有效开发杂草检测模型的因素之一。本文介绍了基于云的自动数据采集系统（CADAS），该系统以固定的时间间隔采集杂草和作物图像，并将植物生长阶段纳入杂草识别的考虑范围。CADAS 是通过将 15 台可见光谱数码相机与 gphoto2 库、外部存储、云存储和装有 Linux 操作系统的计算机集成而开发的。CADAS 系统的数据集包含用于杂草和作物检测的 6 种杂草和 8 种作物。每个杂草和作物物种包含 2000 张图像的数据集已公开发布。原始 RGB 图像在边界框注释的引导下经过裁剪处理，为作物和杂草实例生成单独的 JPG 图像。除裁剪图像外，还公开发布了 200 张带标签文件的原始图像。该数据集可用于研究农业环境中基于深度学习的杂草和作物检测所面临的挑战。此外，研究人员还可以将这些数据与田间数据一起使用，通过减少数据不平衡问题来提高模型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.