A novel automated cloud-based image datasets for high throughput phenotyping in weed classification

IF 1 Q3 MULTIDISCIPLINARY SCIENCES
Sunil G C , Cengiz Koparan , Arjun Upadhyay , Mohammed Raju Ahmed , Yu Zhang , Kirk Howatt , Xin Sun
{"title":"A novel automated cloud-based image datasets for high throughput phenotyping in weed classification","authors":"Sunil G C ,&nbsp;Cengiz Koparan ,&nbsp;Arjun Upadhyay ,&nbsp;Mohammed Raju Ahmed ,&nbsp;Yu Zhang ,&nbsp;Kirk Howatt ,&nbsp;Xin Sun","doi":"10.1016/j.dib.2024.111097","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 111097"},"PeriodicalIF":1.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235234092401059X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.
用于杂草分类中高通量表型分析的新型自动云图像数据集
基于深度学习的杂草检测数据管理涉及数据采集、数据标注、模型开发和模型评估等阶段。在这些数据管理阶段中,数据采集和数据标注是建立稳健模型的劳动密集型耗时步骤。此外,数据集中作物和杂草的时间变化较小,也是限制有效开发杂草检测模型的因素之一。本文介绍了基于云的自动数据采集系统(CADAS),该系统以固定的时间间隔采集杂草和作物图像,并将植物生长阶段纳入杂草识别的考虑范围。CADAS 是通过将 15 台可见光谱数码相机与 gphoto2 库、外部存储、云存储和装有 Linux 操作系统的计算机集成而开发的。CADAS 系统的数据集包含用于杂草和作物检测的 6 种杂草和 8 种作物。每个杂草和作物物种包含 2000 张图像的数据集已公开发布。原始 RGB 图像在边界框注释的引导下经过裁剪处理,为作物和杂草实例生成单独的 JPG 图像。除裁剪图像外,还公开发布了 200 张带标签文件的原始图像。该数据集可用于研究农业环境中基于深度学习的杂草和作物检测所面临的挑战。此外,研究人员还可以将这些数据与田间数据一起使用,通过减少数据不平衡问题来提高模型性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信