Data in Brief最新文献

筛选
英文 中文
PeruFoodNet: A unique dataset of traditional peruvian food for image recognition systems and allergenic ingredient inference 秘鲁食品网:一个独特的传统秘鲁食物数据集,用于图像识别系统和过敏成分推断
IF 1
Data in Brief Pub Date : 2025-05-01 DOI: 10.1016/j.dib.2025.111604
María Franchesca Arzola Gutierrez, Edgar Alexander Canchari Muñoz, Edwin Jonathan Escobedo Cárdenas
{"title":"PeruFoodNet: A unique dataset of traditional peruvian food for image recognition systems and allergenic ingredient inference","authors":"María Franchesca Arzola Gutierrez,&nbsp;Edgar Alexander Canchari Muñoz,&nbsp;Edwin Jonathan Escobedo Cárdenas","doi":"10.1016/j.dib.2025.111604","DOIUrl":"10.1016/j.dib.2025.111604","url":null,"abstract":"<div><div>Peruvian cuisine has won numerous international awards, attracting tourists from around the world to Peru to experience its diverse culinary offerings. However, some dishes contain ingredients that can trigger allergic reactions, posing a potential health risk for visitors. To address this, we created PeruFoodNet, a dataset featuring 4,000 images of traditional Peruvian dishes. The dataset includes 40 of the most popular dishes, such as Ceviche and Anticuchos, with 100 images of each dish. The images of the dishes have been captured from various angles, settings, lighting conditions, dimensions and backgrounds. To gather these images, we prepared the dishes ourselves, purchased some from restaurants, and received contributions from external users over a two-month period. However, most of the images were captured by the authors of the dataset. The dataset is publicly available and can be valuable for research in image recognition and classification using Computer Science techniques, such as Deep Learning. Additionally, it can aid in identifying allergenic ingredients in dishes by linking the dish’s image to a list of ingredients through a technological platform, such as a chatbot or an app.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111604"},"PeriodicalIF":1.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GastroEndoNet: Comprehensive endoscopy image dataset for GERD and polyp detection GastroEndoNet:用于胃食管反流和息肉检测的综合内镜图像数据集
IF 1
Data in Brief Pub Date : 2025-05-01 DOI: 10.1016/j.dib.2025.111572
Abu Kowshir Bitto , Md. Hasan Imam Bijoy , Kamrul Hassan Shakil , Aka Das , Khalid Been Badruzzaman Biplob , Imran Mahmud , Syed Md. Minhaz Hossain
{"title":"GastroEndoNet: Comprehensive endoscopy image dataset for GERD and polyp detection","authors":"Abu Kowshir Bitto ,&nbsp;Md. Hasan Imam Bijoy ,&nbsp;Kamrul Hassan Shakil ,&nbsp;Aka Das ,&nbsp;Khalid Been Badruzzaman Biplob ,&nbsp;Imran Mahmud ,&nbsp;Syed Md. Minhaz Hossain","doi":"10.1016/j.dib.2025.111572","DOIUrl":"10.1016/j.dib.2025.111572","url":null,"abstract":"<div><div>The gastrointestinal (GI) system is fundamental to human health, supporting digestion, nutrient absorption, and waste elimination. Disruptions in GI function, such as Gastroesophageal Reflux Disease (GERD) and gastrointestinal polyps, can lead to significant health complications if not diagnosed and managed early. However, manual interpretation of endoscopic images is time-consuming and prone to human error, highlighting the need for automated diagnostic tools. In this study, we introduce a comprehensive dataset of 24,036 high-quality endoscopic images, categorized into four classes: GERD, GERD Normal, Polyp, and Polyp Normal. This dataset is designed to facilitate research in automated detection and classification of these conditions through machine learning algorithms. The dataset consists of 4006 primary images collected following endoscopic procedures, which were augmented using six distinct techniques, expanding the total number of images to 24,036. It includes 5844 images of GERD cases (974primary images), 6618 images of GERD Normal (1103 primary images), 4674 images of Polyps (779 primary images), and 6900 images of Polyp Normal (1150 primary images). These images, pre-processed and resized to a resolution of 512 × 512 pixels, were obtained from Zainul Haque Sikder Women’s Medical College &amp; Hospital (Pvt.) Ltd. and saved in JPG format. This dataset addresses a critical gap in the availability of large, diverse, and well-labelled medical image datasets for training AI-driven healthcare solutions. It provides an invaluable resource for developing machine learning models aimed at the automatic diagnosis, classification, and detection of GERD and polyps, potentially improving the speed and accuracy of clinical decision-making. By leveraging this dataset, researchers can contribute to enhanced diagnostic tools that could significantly improve healthcare outcomes and patient quality of life in the field of gastroenterology.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111572"},"PeriodicalIF":1.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An open-source and spatially diverse synthetic population dataset for agent-based modelling and microsimulation in Ireland 爱尔兰基于主体的建模和微观模拟的开源和空间多样化合成人口数据集
IF 1
Data in Brief Pub Date : 2025-05-01 DOI: 10.1016/j.dib.2025.111611
Seán Caulfield Curley, Karl Mason, Patrick Mannion
{"title":"An open-source and spatially diverse synthetic population dataset for agent-based modelling and microsimulation in Ireland","authors":"Seán Caulfield Curley,&nbsp;Karl Mason,&nbsp;Patrick Mannion","doi":"10.1016/j.dib.2025.111611","DOIUrl":"10.1016/j.dib.2025.111611","url":null,"abstract":"<div><div>Spatial microsimulations, where simulation units represent people or households in a small area, are extremely useful for modelling a wide range of socio-economic scenarios at a fine scale. The characteristics of individuals in these simulations' populations need to accurately represent the real characteristics of the target area to model realistic scenarios. However, individual-level data is not available for the vast majority of populations, Ireland included, due to privacy concerns. Thus, a representative synthetic population for the Republic of Ireland is needed. The data from four methods of generating synthetic populations at the Electoral Division level are given in this paper. Realistic individuals are created by sampling from the Central Statistics Office (CSO) Labour Force Survey. Spatial heterogeneity is achieved by matching the aggregate counts of individuals' characteristics to those from the CSO Census Small Area Population Statistics. Individuals are assigned six characteristics: age group, sex, marital status, house size, primary economic status, and highest level of education achieved.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111611"},"PeriodicalIF":1.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The chicken gut resistome data from different regions of Kazakhstan 哈萨克斯坦不同地区的鸡肠道抵抗组数据
IF 1
Data in Brief Pub Date : 2025-05-01 DOI: 10.1016/j.dib.2025.111608
Sergey Shilov , Ilya Korotetskiy , Tatyana Kuznetsova , Natalya Zubenko , Lyudmila Ivanova , Elena Solodova , Alfiya Tugeyeva , Anzor Kaziyev , Nadezhda Korotetskaya , Timur Izmailov
{"title":"The chicken gut resistome data from different regions of Kazakhstan","authors":"Sergey Shilov ,&nbsp;Ilya Korotetskiy ,&nbsp;Tatyana Kuznetsova ,&nbsp;Natalya Zubenko ,&nbsp;Lyudmila Ivanova ,&nbsp;Elena Solodova ,&nbsp;Alfiya Tugeyeva ,&nbsp;Anzor Kaziyev ,&nbsp;Nadezhda Korotetskaya ,&nbsp;Timur Izmailov","doi":"10.1016/j.dib.2025.111608","DOIUrl":"10.1016/j.dib.2025.111608","url":null,"abstract":"<div><div>Antibiotic resistance (AR) is a serious global health problem affecting both human medicine and animal agriculture. The poultry farming, especially industrial poultry, antibiotics are widely used for disease prevention and growth promotion, leading to the accumulation and dissemination of antibiotic resistance genes (ARGs) within the intestinal microbiomes of birds. Poultry, which often have close contact with humans, can serve as reservoirs for resistant microorganisms, posing potential public health risks. Determination of avian intestinal resistomes through metagenomic sequencing and bioinformatics analysis enables the identification of diversity and transmission dynamics of ARGs, and to evaluate the influence of environmental factors and conditions of poultry on resistance gene distribution.</div><div>The article presents data of resistome analysis of gut microbiota in populations of chickens from different regions of Kazakhstan. The data obtained will allow to develop a strategy to reduce the spread of antibiotic-resistant pathogens and improve safety in poultry farming, as well as to predict the risk of transmission of resistant microorganisms between animals and humans.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111608"},"PeriodicalIF":1.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
16S rRNA sequencing data of fecal microbiota from juvenile female rats following prenatal poly I:C exposure 幼年雌性大鼠产前聚ⅰ:C暴露后粪便微生物群的16S rRNA测序数据
IF 1
Data in Brief Pub Date : 2025-05-01 DOI: 10.1016/j.dib.2025.111588
Lirong Yang , Huiyu Chen , Menglu Zeng , Yanfang Lu , Chen Xu , Zhenju Cao , Fuchun Zhong , Xinyu Yang , Anying Shen , Fei Xue , Wei Lin , Hua Cao , Chao Deng , Yueqing Su
{"title":"16S rRNA sequencing data of fecal microbiota from juvenile female rats following prenatal poly I:C exposure","authors":"Lirong Yang ,&nbsp;Huiyu Chen ,&nbsp;Menglu Zeng ,&nbsp;Yanfang Lu ,&nbsp;Chen Xu ,&nbsp;Zhenju Cao ,&nbsp;Fuchun Zhong ,&nbsp;Xinyu Yang ,&nbsp;Anying Shen ,&nbsp;Fei Xue ,&nbsp;Wei Lin ,&nbsp;Hua Cao ,&nbsp;Chao Deng ,&nbsp;Yueqing Su","doi":"10.1016/j.dib.2025.111588","DOIUrl":"10.1016/j.dib.2025.111588","url":null,"abstract":"<div><div>With the expanding insights into the “gut-brain axis,” the association between neurodevelopmental disorders (NDDs) and gut microbiota has gained extensive attention in scientific research. Maternal immune activation (MIA) in pregnant females is a crucial environmental risk factor for NDDs in offspring. Polyriboinosinic-polyribocytidylic acid (Poly I:C) belongs to a class of synthetic analogs of double-stranded RNA used to induce MIA in rodents and is widely used in scientific research. The dataset presents 16S rRNA sequencing data of fecal microbiota from juvenile female rats following prenatal Poly I:C exposure. It will help to clarify the impact of prenatal Poly I:C exposure on the intestinal microbiota characteristics of juvenile female offspring rats, reveal the role of the intestinal microbiota in the pathophysiological changes related to maternal immune activation and provide new insights into the mechanism of the microbiota-gut-brain axis in neurodevelopmental disorders.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111588"},"PeriodicalIF":1.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive smartphone image dataset for Aegle Marmelos, Hog plum, and lemon plant leaf disease and freshness assessment 全面的智能手机图像数据集,用于茄子,猪梅和柠檬植物叶片疾病和新鲜度评估
IF 1
Data in Brief Pub Date : 2025-04-29 DOI: 10.1016/j.dib.2025.111590
Mohammad Rezwanul Huq, Jubaer Ahmed, Raiyan Gani, Maherun Nessa Isty, Tasmia Islam
{"title":"Comprehensive smartphone image dataset for Aegle Marmelos, Hog plum, and lemon plant leaf disease and freshness assessment","authors":"Mohammad Rezwanul Huq,&nbsp;Jubaer Ahmed,&nbsp;Raiyan Gani,&nbsp;Maherun Nessa Isty,&nbsp;Tasmia Islam","doi":"10.1016/j.dib.2025.111590","DOIUrl":"10.1016/j.dib.2025.111590","url":null,"abstract":"<div><div>Fruits, which are packed with nutrients, vitamins, and antioxidants, have been known for their numerous health benefits and curative powers, and are utilized in conventional medicine. Aegle Marmelos, Lemon, and Hog Plum are tangy fruits widely recognized in Asian countries for containing a plentiful supply of bioactive substances. They are also highly valuable in boosting metabolism, possessing tremendous therapeutic properties, and holding financial significance. The leaves of these fruit trees are as essential as their fruits, as they contain versatile medicinal and dietary benefits of immense value. However, these leaves are often affected by various fungal and other diseases, which reduce the ability for healthy growth and productivity of both fruits and leaves. Plants infected with various leaf diseases can produce fewer fruits, which are also of lower quality due to failure to reach maturity and lack of sufficient nutritional value. For these reasons, there is a risk of an outbreak in orchards, which can lead to significant financial losses for both producers and the agricultural sector. This signifies that the early identification of leaf diseases and the management of orchards are essential to minimize the impact of leaf diseases and mitigate these issues, ensuring the healthy production of valuable medicinal fruits. In this paper, various infected leaf images are collected from different regions of Rangpur, providing a comprehensive dataset comprising 3941 images. The dataset includes images of three different plant leaves, where 1513 images of Aegle Marmelos, 1232 images of Lemon, and 1196 of Hog plum, where each of the categories encompasses several classes of common leaf diseases. Through this dataset, an early and accurate digital detection system can be employed, allowing producers to clearly identify diseases instead of relying on traditional methods. The precise and timely identification of leaf diseases enables the control of these diseases by taking necessary actions, ensuring the sustainability of plants, and promoting the healthy growth of these invaluable medicinal fruits.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111590"},"PeriodicalIF":1.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whole genome sequence data of Bacillus anthracis strain 3B1 isolated from rice soil 水稻土壤炭疽芽孢杆菌3B1株全基因组序列数据
IF 1
Data in Brief Pub Date : 2025-04-29 DOI: 10.1016/j.dib.2025.111586
Rosamond Chan , Kah-Ooi Chua , Kelly Wan-Ee Teo , Dedat Prismantoro , Nurul Shamsinah Mohd Suhaimi , Abdullah Bilal Ozturk , Nia Rossiana , Febri Doni
{"title":"Whole genome sequence data of Bacillus anthracis strain 3B1 isolated from rice soil","authors":"Rosamond Chan ,&nbsp;Kah-Ooi Chua ,&nbsp;Kelly Wan-Ee Teo ,&nbsp;Dedat Prismantoro ,&nbsp;Nurul Shamsinah Mohd Suhaimi ,&nbsp;Abdullah Bilal Ozturk ,&nbsp;Nia Rossiana ,&nbsp;Febri Doni","doi":"10.1016/j.dib.2025.111586","DOIUrl":"10.1016/j.dib.2025.111586","url":null,"abstract":"<div><div>Strain 3B1 was isolated from the soil of rice field cultivated under the system of rice intensification (SRI) in Sukabumi, West Java, Indonesia. The genome of strain 3B1 was sequenced using the MGI DNBSEQ platform, followed by bioinformatics processing, including genome assembly and gene annotation using SPAdes and Prokka, respectively. The assembled genome had a total length of 5,137,985 bp, distributed across 70 contigs, with 5,364 genes identified. Strain 3B1 shared the highest 16S rRNA gene sequence identity including <em>Bacillus paranthracis, B. nitratireducens, B. cereus, B. paramycoides, B. tropicus</em>, and <em>B. anthracis</em>, in the range of 99.86 to 99.93%. Both 16S rRNA gene and core genes-based phylogenetic analyses placed strain 3B1 in the same clade with <em>B. anthracis</em> strain Ames within the <em>Bacillus</em> genus. The phylogenetic placement was supported by the highest average nucleotide identity (ANI) value of 98.1% and digital DNA-DNA hybridization (dDDH) value of 82.7% shared between the genomes of <em>B. anthracis</em> strain Ames and strain 3B1, indicating that 3B1 is a strain of <em>B. anthracis</em>. Further gene annotation revealed that the genome of strain 3B1 lacked the genes encoding for virulence factors such as the <em>pag, cya</em>, and <em>lef</em>. Nonetheless, this data provides valuable insights into the genomic feature of strain 3B1, which can be bioprospected for various biotechnological applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111586"},"PeriodicalIF":1.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mexican dataset of digital mammograms (MEXBreast) with suspicious clusters of microcalcifications 墨西哥数字乳房x线照片数据集(MEXBreast)可疑的微钙化簇
IF 1
Data in Brief Pub Date : 2025-04-28 DOI: 10.1016/j.dib.2025.111587
Ricardo Salvador Luna Lozoya , Karina Núnez Barragán , Humberto de Jesús Ochoa Domínguez , Juan Humberto Sossa Azuela , Vianey Guadalupe Cruz Sánchez , Osslan Osiris Vergara Villegas
{"title":"Mexican dataset of digital mammograms (MEXBreast) with suspicious clusters of microcalcifications","authors":"Ricardo Salvador Luna Lozoya ,&nbsp;Karina Núnez Barragán ,&nbsp;Humberto de Jesús Ochoa Domínguez ,&nbsp;Juan Humberto Sossa Azuela ,&nbsp;Vianey Guadalupe Cruz Sánchez ,&nbsp;Osslan Osiris Vergara Villegas","doi":"10.1016/j.dib.2025.111587","DOIUrl":"10.1016/j.dib.2025.111587","url":null,"abstract":"<div><div>Breast cancer is one of the most prevalent cancers affecting women worldwide. Early detection and treatment are crucial in significantly reducing mortality rates Microcalcifications (MCs) are of particular importance among the various breast lesions. These tiny calcium deposits within breast tissue are present in approximately 30% of malignant tumors and can serve as critical indirect indicators of early-stage breast cancer. Three or more MCs within an area of 1 cm² are considered a Microcalcification Cluster (MCC) and assigned a BI-RADS category 4, indicating a suspicion of malignancy. Mammography is the most used technique for breast cancer detection. Approximately one in two mammograms showing MCCs is confirmed as cancerous through biopsy. MCCs are challenging to detect, even for experienced radiologists, underscoring the need for computer-aided detection tools such as Convolutional Neural Networks (CNNs). CNNs require large amounts of domain-specific data with consistent resolutions for effective training. However, most publicly available mammogram datasets either lack resolution information or are compiled from heterogeneous sources. Additionally, MCCs are often either unlabeled or sparsely represented in these datasets, limiting their utility for training CNNs. In this dataset, we present the MEXBreast, an annotated MCCs Mexican digital mammogram database, containing images from resolutions of 50, 70, and 100 microns. MEXBreast aims to support the training, validation, and testing of deep learning CNNs.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111587"},"PeriodicalIF":1.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cauliflower leaf diseases: A computer vision dataset for smart agriculture 菜花叶病:智能农业的计算机视觉数据集
IF 1
Data in Brief Pub Date : 2025-04-28 DOI: 10.1016/j.dib.2025.111594
Sabbir Hossain Durjoy, Md. Emon Shikder, Md Mehedi Hasan Shoib, Md Hasan Imam Bijoy
{"title":"Cauliflower leaf diseases: A computer vision dataset for smart agriculture","authors":"Sabbir Hossain Durjoy,&nbsp;Md. Emon Shikder,&nbsp;Md Mehedi Hasan Shoib,&nbsp;Md Hasan Imam Bijoy","doi":"10.1016/j.dib.2025.111594","DOIUrl":"10.1016/j.dib.2025.111594","url":null,"abstract":"<div><div>Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111594"},"PeriodicalIF":1.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset 改进车联网研究:VeReMi数据集的系统预处理框架
IF 1
Data in Brief Pub Date : 2025-04-28 DOI: 10.1016/j.dib.2025.111599
Aparup Roy , Debotosh Bhattacharjee , Ondrej Krejcar
{"title":"Improving internet of vehicles research: A systematic preprocessing framework for the VeReMi dataset","authors":"Aparup Roy ,&nbsp;Debotosh Bhattacharjee ,&nbsp;Ondrej Krejcar","doi":"10.1016/j.dib.2025.111599","DOIUrl":"10.1016/j.dib.2025.111599","url":null,"abstract":"<div><div>The Vehicular Reference Misbehavior Dataset (VeReMi) is a vital resource for advancing Intelligent Transportation Systems (ITS) and the Internet of Vehicles (IoV). However, its large size (∼7 GB) and inherent class imbalance pose significant challenges for machine learning model development. This paper presents a preprocessing framework to enhance VeReMi’s usability and relevance. Through 10 % down-sampling, the dataset was reduced to ∼724MB, making it computationally manageable. Biases were addressed by balancing benign and malicious samples through synthesis and identifying benign instances using predefined criteria. A refined feature set, including key attributes like <em>rcvTime, pos_0, pos_1,</em> and <em>attack_type</em> (renamed <em>attacker_type</em>), was selected to improve machine learning compatibility. This preprocessing pipeline effectively maintains data integrity and preserves the representativeness of malicious patterns. The optimized dataset is well-suited for ITS and IoV applications, such as anomaly detection and network security, underscoring the crucial role of preprocessing in overcoming real-world constraints and enhancing model performance.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111599"},"PeriodicalIF":1.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信