{"title":"A multi-class driver behavior dataset for real-time detection and road safety enhancement","authors":"Arafat Sahin Afridi, Arafath Kafy, Ms. Nazmun Nessa Moon, Md. Shahriar Shakil","doi":"10.1016/j.dib.2025.111529","DOIUrl":"10.1016/j.dib.2025.111529","url":null,"abstract":"<div><div>This paper introduces a novel dataset designed to support the development of AI-driven driver monitoring systems. The dataset captures real-world driver behaviors under diverse driving conditions, including private vehicles and public buses, in Dhaka, Bangladesh. It comprises 7286 high-resolution images categorized into five behavioral classes: Safe Driving, Talking on the Phone, Texting, Turning, and Other Distracting Behaviors. The dataset reflects natural variations in driver behavior, such as different lighting conditions, angles, and vehicle types, making it highly applicable to real-world scenarios. By providing a comprehensive and annotated dataset, we aim to support the development of intelligent transportation systems and contribute to reducing accidents caused by distracted driving. The dataset is publicly available and can be used to train and evaluate machine learning models for real-time driver behavior detection.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111529"},"PeriodicalIF":1.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reflected Light Microscopic Iron ore image dataset for iron ore characterization","authors":"Shama Firdaus , Shamama Anwar , Subrajeet Mohapatra , Prabodha Ranjan Sahoo","doi":"10.1016/j.dib.2025.111540","DOIUrl":"10.1016/j.dib.2025.111540","url":null,"abstract":"<div><div>The dataset contains two folders “IronOreRLM” and “Sample Images”. The folder Sample Images contains few images from each of the grades included in the study and has total of 12 images. This folder is like an abstract of the full dataset and has been created for preview purpose. The IronOreRLM folder is main dataset containing a total of 563 reflected light microscopic (RLM) images of iron ores collected from various mines across India. These RLM images are a valuable source of information about the ores, providing insights into constituent elements, ore quality, structure, and more. Various analyses can be conducted on this dataset to extract meaningful information from the images. The primary goal of acquiring this dataset is to automate the chemical-extensive tasks in mineral processing by leveraging the capabilities of computer vision. While the research work associated with the dataset has been cited in this article, it does not limit the scope of the dataset.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111540"},"PeriodicalIF":1.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143854736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-04-03DOI: 10.1016/j.dib.2025.111533
Muhammad Bamoki , Shakhawan Hares Wady , Soran Badawi
{"title":"Holy Quran Kurdish Sorani translation dataset for language modelling","authors":"Muhammad Bamoki , Shakhawan Hares Wady , Soran Badawi","doi":"10.1016/j.dib.2025.111533","DOIUrl":"10.1016/j.dib.2025.111533","url":null,"abstract":"<div><div>The Holy Quran serves as a foundational text in Islamic theology and has been translated into numerous languages across the globe. This paper introduces a manual translation of the Holy Quran into the Kurdish language, specifically designed to aid natural language processing (NLP) research and linguistic analysis. The translation process employed a thorough methodology that combined advanced linguistic tools with the expertise of bilingual religious scholars, translators, and professional proofreaders over several years. Careful attention was given to maintaining both semantic accuracy and theological precision, ensuring a faithful representation of the original Arabic text. The dataset comprises two primary files: a raw translation and a refined linguistic version. We performed various statistical analyses, including the identification of the top 20 most frequent words, a comparative analysis of verse lengths between the Kurdish and Arabic versions, and an evaluation of unique word distributions in both the raw and processed texts. This Kurdish Quran translation dataset represents a significant resource for computational linguistics, particularly in the development of neural machine translation models and in linguistic research focused on under-resourced languages.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111533"},"PeriodicalIF":1.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-04-03DOI: 10.1016/j.dib.2025.111538
Ashvini Gaikwad, Manoj Deshpande, Varsha Bhole
{"title":"Dataset creation of thermal images of pomegranate for internal defect detection","authors":"Ashvini Gaikwad, Manoj Deshpande, Varsha Bhole","doi":"10.1016/j.dib.2025.111538","DOIUrl":"10.1016/j.dib.2025.111538","url":null,"abstract":"<div><div>Datasets are crucial in various fields, especially in the context of machine learning, data science and research. Datasets are used to train machine learning models. A model learns patterns and relationships from the data it is exposed to. The dataset used for training a machine learning model shall be diversified and consist sufficient samples of desired categories. This paper presents various steps and its outcome in preparing the dataset of digital and thermal images of pomegranate for recognising internal defects. The defects in fruits are often categorised as surface defects and internal defects. The surface defects are recognised with digital RGB image but fails to give insight about the internal structure of the fruit in which we are often interested. The thermal images can be used to detect the internal defects in fruits. When a fruit is subjected to temperature difference as compared to the surrounding, the thermal emissions from fruit captured through a thermal camera (thermal image) gives the key information about the internal damages in the fruit. The internal defects are reflected in thermal image as variations in temperature of adjacent pixels. The k-mean segmentation is applied for identifying internal defects with thermal images in pomegranates to categorize them viz. No defect, major defect and minor defect. This information is useful for training a machine learning algorithms that are intended for bulk processing in the field of fruit defect detection and classification.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111538"},"PeriodicalIF":1.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143833619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-04-03DOI: 10.1016/j.dib.2025.111543
Zack W. Almquist, Benjamin E. Bagozzi, Daria Blinova
{"title":"Historical spatio-temporal data on North American radical environmental direct-action events","authors":"Zack W. Almquist, Benjamin E. Bagozzi, Daria Blinova","doi":"10.1016/j.dib.2025.111543","DOIUrl":"10.1016/j.dib.2025.111543","url":null,"abstract":"<div><div>Social and political event data are widely used in scientific research. However, event data concerning the direct actions of radical environmental groups is comparatively scarce, due in large part to inconsistent news coverage and the clandestine nature of the groups involved. Leveraging original reports maintained by radical environmental groups and their allies, this article codes historical spatio-temporal event data on radical environmental direct-action events in the United States and Canada during a period of heightened prominence in radical environmentalism: 1995-2007. The article's event level data include information on event type, date and geolocation, and the target of each event, as well as the original textual reports of each coded event. This data will facilitate a wide variety of qualitative and quantitative analyses of radical environmental activism, alongside validations of recently developed large language model (LLM) tools for event data extraction. We also offer a separate spatio-temporally aggregated version of these same data. This second dataset is aggregated to the 0.5 × 0.5 decimal-degree spatial grid-year level and adds additional environmental-, environmental group-, and social-correlates. Accordingly, this second dataset will readily enable spatio-temporal statistical analyses of radical environmental direct-action events, their causes, and their determinants—phenomena that have been previously under-explored in large N studies.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111543"},"PeriodicalIF":1.0,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-04-02DOI: 10.1016/j.dib.2025.111535
Liri Fang , Malik Oyewale Salami , Griffin M. Weber , Vetle I. Torvik
{"title":"uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering","authors":"Liri Fang , Malik Oyewale Salami , Griffin M. Weber , Vetle I. Torvik","doi":"10.1016/j.dib.2025.111535","DOIUrl":"10.1016/j.dib.2025.111535","url":null,"abstract":"<div><div>There has been a recent push to make public, aggregate, and increase coverage of bibliographic citation data. Here we describe uCite, a citation dataset containing 564 million PubMed citation pairs aggregated from the following nine sources: PubMed Central, iCite, OpenCitations, Dimensions, Microsoft Academic Graph, Aminer, Semantic Scholar, Lens, and OpCitance. Of these, 51 million (9%) were labeled unreliable, as determined by patterns of source discrepancies explained by ambiguous metadata, crosswalk, and typographical errors, citing future publications, and multi-paper documents. Each source contributes to improved coverage and reliability, but varies dramatically in precision and recall, estimates of which are contrasted with the Web of Science and Scopus herein.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111535"},"PeriodicalIF":1.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143854739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-04-02DOI: 10.1016/j.dib.2025.111537
Frederico L. Martins de Sousa , Thiago E. Alves de Oliveira , Saul E. Delabrida Silva , Bruno Nazário Coelho
{"title":"Image dataset for foreign object detection in iron ore conveyor belt systems","authors":"Frederico L. Martins de Sousa , Thiago E. Alves de Oliveira , Saul E. Delabrida Silva , Bruno Nazário Coelho","doi":"10.1016/j.dib.2025.111537","DOIUrl":"10.1016/j.dib.2025.111537","url":null,"abstract":"<div><div>This paper presents a dataset of high-speed recordings of iron ore flowing on a laboratory-scale conveyor belt, captured with top-down videography and organized to highlight both regular operation and the presence of foreign objects. The conveyor belt measures 35 cm in width by 1.10 m in length. It operates at adjustable speeds and is powered by an electric motor to transport hematite and selected contaminants, such as wood pieces or plastic fragments. An NVIDIA Jetson TX2, equipped with its onboard OV5693 camera, recorded the footage at 120 frames per second in 1280 × 720 resolution, using a GStreamer pipeline to stream the video directly to disk. Individual frames were then extracted and sorted into subfolders, distinguishing normal operations from segments containing manually introduced anomalies. Additional subsets further categorize objects by type, enabling adaptation to various detection or classification approaches. This resource is intended to facilitate comparative evaluations of image-based detection approaches in a controlled mining context while also supporting extended uses in computer vision research related to industrial material transportation.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111537"},"PeriodicalIF":1.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-28DOI: 10.1016/j.dib.2025.111530
Yelin Tian , Lizhi Ouyang , Xinyu Li , Li Xiao , Xu Qiao , Yixuan Chen , Tingting Fang , Yimian Ma
{"title":"Dataset of a de novo transcriptome assembly for the leaves and rhizomes of a five-year-old Atractylodes chinensis","authors":"Yelin Tian , Lizhi Ouyang , Xinyu Li , Li Xiao , Xu Qiao , Yixuan Chen , Tingting Fang , Yimian Ma","doi":"10.1016/j.dib.2025.111530","DOIUrl":"10.1016/j.dib.2025.111530","url":null,"abstract":"<div><div><em>Atractylodes (A.) chinensis</em> (DC.) Koidz. is a traditional Chinese medicinal plant. The rhizome contains its medicinal component, which consists of abundant essential oils. Sesquiterpene and atractylodin are the main active ingredients in these essential oils. On the other hand, the leaves contain less medicinal active ingredients. Thus far, studies on the formation mechanism of the active ingredients, especially atractylodin, are still limited. This study used RNA sequencing to reveal the <em>de novo</em> transcriptome of the leaves and rhizomes of a five-year old <em>A. chinensis</em> plant with divided leaves. High-throughput sequencing data was acquired using the Illumina NovaSeq X Plus system (Illumina, USA) in PE150 mode. After the data was corrected and filtered, the clean data was used for subsequent analysis. Based on the assembled sequence file, the differentially expressed unigenes between the rhizomes and leaves of <em>A. chinensis</em> were analyzed. The assembled unigene file and table including these differentially expressed unigenes was deposited in the “Mendeley Data” database. The raw SRA data was deposited in the National Center of Biotechnology Information (NCBI) Sequence Read Archive (SRA) database.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111530"},"PeriodicalIF":1.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-28DOI: 10.1016/j.dib.2025.111525
Maha M. Habib , Marjolein van Esch , Maarten van Ham , Wim J. Timmermans
{"title":"High-Resolution Datasets for Urban Heat Vulnerability Assessment in Urbanized Areas of the Netherlands","authors":"Maha M. Habib , Marjolein van Esch , Maarten van Ham , Wim J. Timmermans","doi":"10.1016/j.dib.2025.111525","DOIUrl":"10.1016/j.dib.2025.111525","url":null,"abstract":"<div><div>The urban heat island effect is increasingly affecting the quality of life in cities, and detailed data is crucial in designing mitigation policies. However, weather stations are predominantly situated outside urban environments, limiting their ability to represent the varying air temperatures within street canyons. This data paper addresses this limitation by presenting a dataset of the modeled daily maximum Urban Heat Island (UHI<sub>max</sub>) effect across 99 Dutch municipalities during the summer of 2023. This is achieved by implementing a semi-empirical equation which incorporates readily available meteorological variables and two key urban morphological indicators, namely the sky view factor and fractional vegetation cover. Two primary datasets are presented: (1) a high-resolution dataset of modeled UHI<sub>max</sub>, and (2) a Sky View Factor dataset. Both datasets are provided in GeoTIFF format at a 5-meter spatial resolution. Additionally, this paper presents a straightforward methodology for obtaining UHI<sub>max</sub> values for other periods. The datasets and accompanying methodology provide valuable resources for advancing urban climate research, urban planning, and heat mitigation strategies in the Netherlands.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111525"},"PeriodicalIF":1.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-27DOI: 10.1016/j.dib.2025.111522
Luis Pastor Sánchez-Fernández
{"title":"Dataset for gait assessment in Parkinson's disease patients","authors":"Luis Pastor Sánchez-Fernández","doi":"10.1016/j.dib.2025.111522","DOIUrl":"10.1016/j.dib.2025.111522","url":null,"abstract":"<div><div>Patients with Parkinson's disease (PD) can present walk disorders, with slow movements, freezing, short steps, speed changes, shuffling, little arm swing, and festinating gait, among others. Movement Disorder Society-Unified Parkinson's Disease-Rating Rating Scale (MDS-UPDRS) has a good reputation for uniformly evaluating PD's motor and non-motor aspects. Nevertheless, the motor clinical assessment is based on visual observations, presenting the qualitative results, and the subtle differences are not recognized. This paper presents a dataset for gait assessments in Parkinson's patients and healthy control subjects. The dataset includes eight biomechanical indicators and raw signals, allowing other authors to replicate the published methods or create new evaluation procedures or algorithms. The tables with eight biomechanical indicators are related to physician evaluations, including data from healthy control subjects, considering only the dynamic accelerations and gyroscope signals.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111522"},"PeriodicalIF":1.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143791912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}