Cauliflower leaf diseases: A computer vision dataset for smart agriculture

IF 1 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief Pub Date : 2025-04-28 DOI:10.1016/j.dib.2025.111594

Sabbir Hossain Durjoy, Md. Emon Shikder, Md Mehedi Hasan Shoib, Md Hasan Imam Bijoy

{"title":"Cauliflower leaf diseases: A computer vision dataset for smart agriculture","authors":"Sabbir Hossain Durjoy, Md. Emon Shikder, Md Mehedi Hasan Shoib, Md Hasan Imam Bijoy","doi":"10.1016/j.dib.2025.111594","DOIUrl":null,"url":null,"abstract":"<div><div>Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111594"},"PeriodicalIF":1.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925003269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.

查看原文本刊更多论文

菜花叶病：智能农业的计算机视觉数据集

花椰菜是比较有名的蔬菜之一。由于富含维生素、抗氧化剂和高纤维等营养物质，它在全球范围内都被食用。这些营养物质有助于消化、免疫系统和减少炎症。菜花叶片的各种疾病在早期很难诊断，这是农民普遍面临的问题。这些疾病往往会以非常快的速度在整个农田的作物中传播。这反过来又造成了严重的收成损失，并使保护作物变得更加繁琐和资源密集。因此，农民更有可能使用大量杀虫剂和有害化学物质，以简化获得更可靠的作物产量的过程。这不仅代价高昂，而且对作物的质量和环境的健康都是有害的。在这篇文章中，我们介绍了一个包含大量花椰菜叶子图像的数据集。这是为了以比现在更快的速度推动这一主题的发展，并帮助加强疾病监测、诊断和预防技术。我们收集了2024年11月至2025年1月之间的数据集图像。在这个数据集中，花椰菜叶子被分为三类：健康、虫洞和黑腐病，每一类都反映了在不同阶段影响植物健康的特定条件。该数据集由2,661张图像组成。这些照片是在孟加拉国不同的地点，在不同的天气条件、日期、温度和不同的设备下拍摄的。为了提高数据质量，我们使用了几个步骤来处理数据集，确保它能够反映现实世界的条件并为训练做好准备。将图像调整为3000 × 3000像素的标准大小，调整亮度以使图像更容易识别，并删除重复和质量差的图像。这些操作有助于确保数据集处于最佳状态，以进行有效的模型训练。该数据集将对农业研究、精准农业和有效的疾病管理非常有效。它应该有助于开发高度准确的机器学习模型，用于早期检测花椰菜叶片疾病。该数据集用于训练深度学习模型，以支持精准农业的自动监控和智能决策。该数据集在实时和实际应用方面也具有巨大的潜力。它可以用来开发移动应用程序或自动化系统等应用程序，农民可以在早期阶段轻松识别疾病并立即采取行动，而不需要专家的现场知识。该数据集还可以与无人机和传感器等智能农业设备一起使用，以实时跟踪大片农田。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.