Data in Brief最新文献_第9页

Data augmentation English-Indonesia-Madurese parallel corpus dataset using neural machine translation 基于神经机器翻译的英语-印尼语-马杜雷语并行语料库数据增强

IF 1.4

Data in Brief Pub Date : 2025-09-07 DOI: 10.1016/j.dib.2025.112046

Fairuz Iqbal Maulana , Yaya Heryadi , Gede Putra Kusuma , Widodo Budiharto

引用次数: 0

MangoClassify-12: A high-resolution image dataset of twelve indigenous Bangladeshi mango cultivars 芒果分类-12:12个孟加拉国本土芒果品种的高分辨率图像数据集

IF 1.4

Data in Brief Pub Date : 2025-09-07 DOI: 10.1016/j.dib.2025.112037

Md Sajedur Rahman , Md Mahfuz Ahmed Nahin , Md Mahbubur Rahman , Mollika Rani , Md Ashraful Islam , Al Bashir , Ahmad Shafkat , Bijon Mallik , Yaqoob Majeed

{"title":"MangoClassify-12: A high-resolution image dataset of twelve indigenous Bangladeshi mango cultivars","authors":"Md Sajedur Rahman , Md Mahfuz Ahmed Nahin , Md Mahbubur Rahman , Mollika Rani , Md Ashraful Islam , Al Bashir , Ahmad Shafkat , Bijon Mallik , Yaqoob Majeed","doi":"10.1016/j.dib.2025.112037","DOIUrl":"10.1016/j.dib.2025.112037","url":null,"abstract":"<div><div>A high-resolution image dataset, MangoClassify-12, comprising 3900 JPEG images of twelve indigenous Bangladeshi mango cultivars, was assembled to enable automated classification. Images were captured between early May and July 10th, 2025, from three distinct production regions (Mirpur-2, Dhaka; Phulbari, Dinajpur; Rajshahi) under natural light using four smartphones. All images were reviewed by agricultural experts to exclude damaged or overripe specimens. The dataset covers twelve cultivars: Amrapali, Himsagar, Harivanga, Langra, Fazli, Gopalbhog, Ranibhog, Gobindobhog, Sundari, Banana Mango, Bari-4 and Khirsapat. Metadata are organized in a structured folder hierarchy. MangoClassify-12 is openly accessible via DOI and supports machine learning applications such as variety identification, quality assessment and mobile-based recognition. By providing raw images without predefined splits or augmentations, the dataset offers a flexible benchmark for computer vision research in agriculture.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112037"},"PeriodicalIF":1.4,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A dataset of lung ultrasound images for automated AI-based lung disease classification 用于基于人工智能的肺部疾病自动分类的肺部超声图像数据集

IF 1.4

Data in Brief Pub Date : 2025-09-06 DOI: 10.1016/j.dib.2025.112034

Andrew Katumba , Sudi Murindanyi , Nixson Okila , Joyce Nakatumba-Nabende , Cosmas Mwikirize , Jonathan Serugunda , Samuel Bugeza , Anthony Oriekot , Juliet Bossa , Eva Nabawanuka

引用次数: 0

Software bug report dataset from Eclipse projects 来自Eclipse项目的软件bug报告数据集

IF 1.4

Data in Brief Pub Date : 2025-09-05 DOI: 10.1016/j.dib.2025.112016

Noelia Lopez-Duran, David Romero-Organvidez, Fermín L. Cruz, David Benavides

{"title":"Software bug report dataset from Eclipse projects","authors":"Noelia Lopez-Duran, David Romero-Organvidez, Fermín L. Cruz, David Benavides","doi":"10.1016/j.dib.2025.112016","DOIUrl":"10.1016/j.dib.2025.112016","url":null,"abstract":"<div><div>In recent decades, the analysis of data from software projects — including source control systems, defect tracking systems, and code review repositories — has greatly improved our understanding of software development and its evolution. However, obtaining this information can be time-consuming, and the extracted data is not always well-maintained. This paper introduces an extensive dataset generated from Bugzilla repositories, focusing on key products from the Eclipse bug-tracking system. This dataset addresses the need for up-to-date data in existing repositories, preserving crucial historical information that may be lost due to the transition from Bugzilla to newer bug-tracking systems like Jira or GitHub Issues. Our dataset includes 301,378 bug reports along with all related information, organised into different folders that indicate the project in which the bug was filed. Additionally, we present a custom and lightweight Command Line Interface (CLI) tool designed to efficiently extract detailed information from Bugzilla repositories, automating data collection across various Bugzilla instances. The dataset and tool can be utilized for defect prediction, software maintenance, and evolutionary analysis. To the best of our knowledge, this is the largest, most complete, and up-to-date dataset of Eclipse bug reports available.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112016"},"PeriodicalIF":1.4,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145044273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Planes of thought dataset: A new dataset for the measurement of human thought by artificial intelligence model based on Cloninger's theory 思维平面数据集：基于Cloninger理论的人工智能模型测量人类思维的新数据集

IF 1.4

Data in Brief Pub Date : 2025-09-04 DOI: 10.1016/j.dib.2025.112028

Atra Joudaki , Leyli Mohammad Khanli , Alireza Farnam , Yashar Sarbaz , Jafar Tanha

{"title":"Planes of thought dataset: A new dataset for the measurement of human thought by artificial intelligence model based on Cloninger's theory","authors":"Atra Joudaki , Leyli Mohammad Khanli , Alireza Farnam , Yashar Sarbaz , Jafar Tanha","doi":"10.1016/j.dib.2025.112028","DOIUrl":"10.1016/j.dib.2025.112028","url":null,"abstract":"<div><div>Today, we have seen remarkable progress in artificial intelligence, especially in natural language processing, chatbots, and sentiment analysis. Using sentiment analysis techniques, chatbots can better understand what users say and produce more useful answers. The more the chatbot understands what the users say, the more interactions between the machine and human will be created. To this end, we must be able to define beyond sentiment analysis for artificial intelligence systems. For this, models must be able to describe and measurement of thought. To solve this challenge, we have created a dataset using Cloninger's theory. In this theory, Cloninger created a global model of human thought and its development, considering the evolution of animal learning abilities to measure thought. Since the thought of humans has not been measured before and accurate measurement is required to perform scientific work, our goal in providing this dataset is to enable artificial intelligence models to do this. Cloninger has divided human thought into five different planes. These planes include: sexual (2), material (3), emotional (4), intellectual (5), and spiritual (7). Three experts labeled the first 10,000 frequently used dictionary words to collect this dataset using Cloninger's theory. We then used these labeled words as ground truths to label sentences. In this dataset, we have labeled 20,000 sentences using this theory so that we can use this dataset to make artificial intelligence models more understanding of the user’s statements.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112028"},"PeriodicalIF":1.4,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters 基于llm的HTTPS网络安全意识评估：来自摩洛哥网络用户和网站管理员的数据集

IF 1.4

Data in Brief Pub Date : 2025-09-04 DOI: 10.1016/j.dib.2025.112024

Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi

{"title":"LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters","authors":"Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi","doi":"10.1016/j.dib.2025.112024","DOIUrl":"10.1016/j.dib.2025.112024","url":null,"abstract":"<div><div>Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.</div><div>The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.</div><div>The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112024"},"PeriodicalIF":1.4,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145044275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JujubeBruiseNet: A high-resolution image dataset for bruise detection in Ziziphus mauritiana JujubeBruiseNet：毛里求斯Ziziphus地区的高分辨率图像数据集

IF 1.4

Data in Brief Pub Date : 2025-09-03 DOI: 10.1016/j.dib.2025.112031

Md Arham Tabib, Sumyia Sabrin Liza, Md Mizanur Rahman

{"title":"JujubeBruiseNet: A high-resolution image dataset for bruise detection in Ziziphus mauritiana","authors":"Md Arham Tabib, Sumyia Sabrin Liza, Md Mizanur Rahman","doi":"10.1016/j.dib.2025.112031","DOIUrl":"10.1016/j.dib.2025.112031","url":null,"abstract":"<div><div>The article presents JujubeBruiseNet, a high-resolution image dataset designed for bruise detection in <em>Ziziphus mauritiana</em> (jujube) fruits. <em>Ziziphus mauritiana</em> is a seasonal fruit often found in late summer to early fall. The bruise detection in this fruit is crucial for post-harvesting, fruit processing, and food packaging. Manual detection of bruises is time-consuming and often leads to inaccuracy. Therefore, developing a novel classification model is essential, which will immediately recognize bruises in the fruits and, as a result, decrease human effort, expenses, and production time in the agriculture sector. The dataset contains a total of 1464 original photos categorized by two classes labelled Healthy and Bruised. We collected the fruit from the local market and fields near Savar, Dhaka, Bangladesh, with the help of domain experts in the period from 10th March to 20th March 2025. To reduce outside variations and provide uniformity, the photos were taken under precisely controlled lighting. This article offers a major dataset for researchers to develop effective quality assessment models for post-harvesting fruit sorting and classification. Convolutional neural networks (CNNs) and other computer vision models can be trained exclusively using this dataset to increase the precision of agricultural product bruise recognition. The dataset can facilitate research in computer vision-based agricultural monitoring and fruit quality evaluation, openly accessible on Mendeley Data, link: JujubeBruiseNet: A Dataset for Bruise Detection in Ziziphus mauritiana - Mendeley Data</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112031"},"PeriodicalIF":1.4,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145044274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Validated dataset combining simulations and measurements for emission analysis of naturally ventilated dairy barns 结合模拟和测量的有效数据集用于自然通风奶牛场的排放分析

IF 1.4

Data in Brief Pub Date : 2025-09-01 DOI: 10.1016/j.dib.2025.112017

Julian Hartje, Abu Zar Shafiullah

{"title":"Validated dataset combining simulations and measurements for emission analysis of naturally ventilated dairy barns","authors":"Julian Hartje, Abu Zar Shafiullah","doi":"10.1016/j.dib.2025.112017","DOIUrl":"10.1016/j.dib.2025.112017","url":null,"abstract":"<div><div>Quantifying emissions from naturally ventilated livestock buildings is challenging due to the large side wall openings. In addition, measurement campaigns are expensive and time consuming and are therefore limited to a few short measurement weeks during the year. However, emission factors or annual averages are extrapolated from these data sets. Simulations can complement this data set by extending it and thus broadening the basis for the extrapolation of emission factors or evaluation of the barn and management system. The dataset presented consists of solution data from computational fluid dynamics (CFD) simulations of naturally ventilated cattle barns and the corresponding simulation and geometry files. The simulations were validated using data sets from measurement campaigns in three naturally ventilated cattle barns in Germany. Together with weather data from the German Weather Service (DWD), weather situations that occurred outside the measurement weeks could be investigated. With the presented data set further investigations are possible. Together with the measured data, simulation techniques, data aggregation and the development of new numerical modelling approaches can be investigated in detail.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112017"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A real-world iiot dataset for predictive maintenance of metalworking fluids 现实世界的工业物联网数据集，用于金属加工液的预测性维护

IF 1.4

Data in Brief Pub Date : 2025-09-01 DOI: 10.1016/j.dib.2025.112020

Carlos Cambra, Félix Movilla, Félix de Miguel, Daniel Urda, Nuria Velasco, Álvaro Herrero

{"title":"A real-world iiot dataset for predictive maintenance of metalworking fluids","authors":"Carlos Cambra, Félix Movilla, Félix de Miguel, Daniel Urda, Nuria Velasco, Álvaro Herrero","doi":"10.1016/j.dib.2025.112020","DOIUrl":"10.1016/j.dib.2025.112020","url":null,"abstract":"<div><div>This article presents a multivariate time series dataset detailing the physicochemical degradation of an industrial metalworking fluid (MWF). The data were collected continuously over several months from a test tank under typical operational conditions at an industrial facility in Spain. Four critical variables were monitored using industrial-grade sensors: pH, temperature, concentration, and conductivity. The dataset is provided in five CSV files. The primary file, measures.csv, contains the preprocessed time series at a uniform 5-minute frequency, with authentic missing data gaps intentionally preserved to reflect real-world sensor and connectivity issues. The four additional files serve as a comprehensive benchmark for data imputation algorithms. Each of these benchmark files corresponds to a single variable and includes the original data alongside imputed values generated by five distinct methods: K-Nearest Neighbours (KNN), a hybrid model (HybridKCL), an LSTM-based Variational Autoencoder (LSTM-VAE), and both pre-trained and fine-tuned versions of the MOMENT foundation model. This resource enables researchers and practitioners to develop, validate, and compare predictive maintenance models, anomaly detection systems, and advanced imputation techniques. Furthermore, it serves as a valuable educational tool for addressing common challenges in industrial IoT data, fostering advancements in sustainable and efficient manufacturing.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112020"},"PeriodicalIF":1.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FTIR spectroscopy and VSC-based colour assessment dataset for comparative analysis of cremated bones FTIR光谱和基于vsc的颜色评估数据集用于火化骨骼的比较分析

IF 1.4

Data in Brief Pub Date : 2025-09-01 DOI: 10.1016/j.dib.2025.112019

Anu Lillak , Tim Thompson , Mari Tõrv , Ester Oras

引用次数: 0