Sabbir Hossain Durjoy, Md. Emon Shikder, Md Mehedi Hasan Shoib, Md Hasan Imam Bijoy
{"title":"Cauliflower leaf diseases: A computer vision dataset for smart agriculture","authors":"Sabbir Hossain Durjoy, Md. Emon Shikder, Md Mehedi Hasan Shoib, Md Hasan Imam Bijoy","doi":"10.1016/j.dib.2025.111594","DOIUrl":null,"url":null,"abstract":"<div><div>Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111594"},"PeriodicalIF":1.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925003269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.