{"title":"Bridging the industrial data gap: Top-down approach from national statistics to site-level energy consumption data","authors":"Enrico Bernelli Zazzera , Matteo Giacomo Prina , Riccardo Marchetti , Steffi Misconel , Giampaolo Manzolini , Wolfram Sparber","doi":"10.1016/j.dib.2025.111365","DOIUrl":"10.1016/j.dib.2025.111365","url":null,"abstract":"<div><div>Detailed data on hard-to-abate industrial sectors is crucial for developing targeted decarbonization measures in energy system modeling, yet such information is rarely available through open sources. This paper presents a top-down methodology to estimate detailed industrial site-level energy and emissions databases by integrating and expanding publicly available data. The methodology addresses three key challenges: (1) the disaggregation of national energy consumption data to site level, (2) the categorization of process heat by four temperature ranges (<100 °C, 100 °C-500 °C, 500 °C-1000 °C, and >1000 °C) and direct use of electricity, and (3) the integration of process emissions from feedstock use in hard-to-abate industrial sectors. The approach is demonstrated through application to the Italian industrial sector for the year 2022, resulting in a database that documents site-specific consumption across seven energy sources: solid fossil fuels, manufactured gases, oil and petroleum products, natural gas, biofuels, non-renewable wastes, naphtha and electricity. The method can be replicated for other European countries, providing researchers and policymakers with a standardized approach to create detailed industrial energy databases. Results show that the chemical and petrochemical sector dominates the industrial energy landscape of Italy, followed by iron and steel, non-metallic minerals, and paper and pulp. The geographical distribution reveals a concentration of major industrial facilities in northern Italy, with notable exceptions including significant steel production in Taranto (south) and petrochemical complexes in Sicily and Sardinia.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"59 ","pages":"Article 111365"},"PeriodicalIF":1.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111219
Mohammad Harun Or Rashid , Md Tanbeer Jubaer , Barisha Chowdhury , Md Minhazul Islam
{"title":"Live and mediated user engagements: A comparative dataset from two Bengali audio-story based youtube channels","authors":"Mohammad Harun Or Rashid , Md Tanbeer Jubaer , Barisha Chowdhury , Md Minhazul Islam","doi":"10.1016/j.dib.2024.111219","DOIUrl":"10.1016/j.dib.2024.111219","url":null,"abstract":"<div><div>The dataset contains user engagement and language-related information from two audio story-producing channels on YouTube. It offers a comparative view of live and mediated engagements, which includes information pertinent to the user's interaction of audio-story based YouTube contents. The speciality of this dataset is the inclusion of textual data of live comments on YouTube videos. It covers the data from July 2022 to February 2024 yielding 230 audio stories of the respective channels. More than 250,000 comments and nearly 300,000 live chats from the videos are included in this dataset. It provides quantitative information of the contents such as number of views, comments and likes. Along with the textual data and numerical engagement-related data, this dataset contains the language categorization of the users’ comments. It is expected that this dataset will be used in further research producing novel insights in different disciplines, uncovering patterns of digital engagement, language use in different platforms, and the dynamics of live versus post-live interactions. Additionally, content creators and marketers can utilize insights from this dataset to optimize their strategies for audience engagement. The dataset serves as a valuable resource for cross-disciplinary studies in digital media, linguistics, and social media analysis.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111219"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719342/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IDDMSLD: An image dataset for detecting Malabar spinach leaf diseases","authors":"Adnan Rahman Sayeem, Jannatul Ferdous Omi, Mehedi Hasan, Mayen Uddin Mojumdar, Narayan Ranjan Chakraborty","doi":"10.1016/j.dib.2025.111293","DOIUrl":"10.1016/j.dib.2025.111293","url":null,"abstract":"<div><div>Agriculture has always played a vital role in the economic development of Bangladesh. In Agriculture, leaf diseases have become an issue because they can lead to a major drop in both quality and quantity of crops. Therefore, leveraging technology to automatically detect diseases on leaves plays an important role in farming. Malabar Spinach (Basella alba) is a well-known, widely grown leafy vegetable, which is valued for its nutritional benefits. However, there is almost no dataset that can aid in identifying diseases affecting this important crop, which often leads to decreased quality as well as financial drawback. This lack of resources makes it difficult for farmers to recognize and manage common diseases. Our purpose is to solve this problem by creating a unique dataset of Bangladesh's Malabar Spinach leaves that will ease agricultural management and disease detection. Our dataset contains both healthy and diseased samples, categorised into four common ailments: Anthracnose, Bacterial Spot, Downy Mildew, and Pest Damage. We collected 3,006 original images in total. Images were collected from various locations in Bangladesh, including Mirpur, Savar, Sirajganj and Gazipur, with photographs taken under natural lighting conditions at different times of the day. This dataset will help the researchers for further research on Malabar Spinach disease detection implementing various efficient computational models and applying advanced machine learning techniques.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111293"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111206
Randall P. Niedz, Kim D. Bowman
{"title":"Image dataset: UAV images and ground data of one ‘Bingo’ mandarin and two ‘Valencia’ orange rootstock trials conducted in Florida","authors":"Randall P. Niedz, Kim D. Bowman","doi":"10.1016/j.dib.2024.111206","DOIUrl":"10.1016/j.dib.2024.111206","url":null,"abstract":"<div><div>The data are aerial images and ground tree measurement data of 3 citrus rootstock trials. Developing new citrus rootstock varieties requires field trials to test to identify selections with improved horticultural performance. A bud from a scion variety is grafted onto the rootstock and grown in a nursery until the grafted plant is ready to be planted in the field, which is in about one year. Trees in the field are assessed each year by measuring height, canopy diameter in 2 dimensions, overall health, and fruit number and quality factors when the trees begin to have a significant crop (∼3 years). Data collection of each tree is done manually. The image and ground data sets are of 3 rootstock trials that includes a 3-year-old Bingo mandarin hybrid trial of 206 trees, a 6-year-old Valencia orange trial of 643 trees, and a 7-year-old Valencia orange trials of 648 trees. Data for each trial includes aerial images and ground data of height, canopy diameters, and an overall health rating. The combination of ground validated measures and aerial images make this data set useful for building AI-based aerial image data collection applications. The data will be useful for 1) visualizing the effects of different rootstock selections and varieties on scion growth, effects that may not be fully captured with single measure metrics; and 2) development of image analysis applications and segmentation algorithms that can extract data from the images that are suitable for replacing some or all the ground measures.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111206"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111287
Naida Solak , André Ferreira , Gijs Luijten , Behrus Puladi , Victor Alves , Jan Egger
{"title":"GBM-Reservoir: Brain tumor (Glioblastoma Multiforme) MRI dataset collection with ground truth segmentation masks","authors":"Naida Solak , André Ferreira , Gijs Luijten , Behrus Puladi , Victor Alves , Jan Egger","doi":"10.1016/j.dib.2025.111287","DOIUrl":"10.1016/j.dib.2025.111287","url":null,"abstract":"<div><div>In this article, we present a brain tumor database collection comprising 23,049 samples, with each sample including four different types of MRI brain scans: FLAIR, T1, T1ce, and T2. Additionally, one or two segmentation masks (ground truth) are provided for each sample. The first mask is the raw output from the registration process and is provided for all samples, while the second mask, provided particularly for synthetic samples, is a post-processed version of the first, designed to simplify interpretation and optimize it for network training. These samples have been acquired via registration process of 438 samples available at the moment of registration from the original dataset provided by the BraTS 2022 Challenge. Registering each pair of existing brain scans results in two additional scans that retain a similar brain shape while featuring varying tumor locations. Consequently, by registering all possible pairs, a dataset originally consisting of n samples can be expanded to n<sup>2</sup> samples. The original dataset was collected from different institutions under standard clinical conditions, but with different equipment and imaging protocols. As a result, the image quality is heterogeneous, reflecting the diversity of clinical practices across institutions. This dataset can be utilized for various tasks, such as developing fully automated segmentation algorithms for new, unseen brain tumor cases, particularly through deep learning-based approaches, since ground truth is provided for each sample.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111287"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143131058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Internal transcribed spacer metagenomics data unravelling the core fungal community structure residing the wheat and maize rhizosphere","authors":"Sadia Latif , Rizwana Kousar , Anum Fatima , Naeem Khan , Hina Fatimah","doi":"10.1016/j.dib.2025.111269","DOIUrl":"10.1016/j.dib.2025.111269","url":null,"abstract":"<div><div>Plants are colonized by a vast array of microorganisms that outstrip plant cell densities and genes, thus referred to as plant's second genome or extended genome. The microbial communities exert a significant influence on the vigor, growth, development and productivity of plants by supporting nutrient acquisition, organic matter decomposition and tolerance against biotic and abiotic stresses such as heat, high salt, drought and disease, by regulating plant defense responses. The rhizosphere is a complex micro-ecological zone in the direct vicinity of plant roots and is considered a hotspot of microbial diversity. The exploration and understanding of the rhizosphere microbes can be valuable in sustainable agriculture. The present dataset aimed to reveal the core fungal community residing in the rhizosphere of wheat (<strong><em>Triticum aestivum</em></strong> L.) and maize (<strong><em>Zea mays</em></strong> L.). The rhizosphere fungal communities were explored via amplicon sequencing of the Internal Transcribed Spacer (ITS) region using the IonS5<sup>TM</sup>XL sequencing platform. The data obtained were filtered and the high-quality reads were clustered into Microbial Operational Taxonomic Units (OTUs) at 97 % similarity. Further, the data were subjected to alpha and beta diversity analysis. The OTUs obtained from the wheat rhizosphere soils of Kallar Syedian (TA.KS), Islamabad (TA.ISB) and Mirpur Azad Kashmir (TA.MAK) were 603, 513 and 424, respectively, whereas 616 OTUs were found in the maize rhizosphere soil of Kallar Syedian (ZM.KS). The major fungal phyla inhabiting the rhizosphere soils were Ascomycota, accounting for 94 %, 97 %, 95 % and 90 % of the fungal community in ZM.KS, TA.KS, TA.MAK and TA.ISB, respectively. Alpha and beta diversity analysis depicted the presence of considerable variations in the relative abundance of fungal groups residing in the rhizosphere soils. The dataset obtained can be employed in meta-analysis studies that will pave the way toward understanding the core fungal community structure and will directly aid in enhancing crop productivity through rhizosphere engineering.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111269"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111345
Emma Menant , Bruno Tassart , Céline Meillier , Frédéric Lemoine , Alexandre Petermann , Daniel Jost , Xavier Jouven
{"title":"Registration of transthoracic impedance signal and ventilation volume data in out-of-hospital cardiac arrest","authors":"Emma Menant , Bruno Tassart , Céline Meillier , Frédéric Lemoine , Alexandre Petermann , Daniel Jost , Xavier Jouven","doi":"10.1016/j.dib.2025.111345","DOIUrl":"10.1016/j.dib.2025.111345","url":null,"abstract":"<div><div>Studying ventilation during out-of-hospital cardiac arrest (OHCA) presents significant challenges due to the limited methods available for monitoring ventilation during Basic Life Support care. Researchers are increasingly focusing on transthoracic impedance (TTI) as a new means of investigating ventilation.</div><div>We employed manual ventilation monitoring devices to record cardiopulmonary resuscitation (CPR), ventilation volumes (Vvol) and TTI data. A registration of TTI with Vvol signals is performed. The Vvol are considered as the ground truth for ventilation detection in our dataset. The latter comprises data recorded during OHCA involving adult patients. Specifically, the data include TTI signals and automated external defibrillators (AED) analysis markers collected using Defigard Touch 7® AEDs (Schiller Medical, Wissembourg, France), as well as CPR Vvol recorded by manual ventilation monitoring devices (EOlife®, ARCHEON Medical, Besançon, France).</div><div>The TTI signals and Vvol data that derived from the same OHCA can be registered. It allows better characterization of the TTI signal by identifying when TTI variations are caused by ventilations and distinguishing these from artifacts. This registration process allows to position the ventilation on TTI.</div><div>The combination of TTI signals and Vvol data improves readability of CPR process, by providing a robust method to interpret TTI signals.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"59 ","pages":"Article 111345"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143347859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111208
Heather Simon , James Beidler , Kirk R. Baker , Barron H. Henderson , Loren Fox , Chris Misenis , Patrick Campbell , Jeff Vukovich , Norm Possiel , Alison Eyth
{"title":"Expedited modeling of burn events results (EMBER): A screening-level dataset of 2023 ozone fire impacts in the US","authors":"Heather Simon , James Beidler , Kirk R. Baker , Barron H. Henderson , Loren Fox , Chris Misenis , Patrick Campbell , Jeff Vukovich , Norm Possiel , Alison Eyth","doi":"10.1016/j.dib.2024.111208","DOIUrl":"10.1016/j.dib.2024.111208","url":null,"abstract":"<div><div>The Expedited Modeling of Burn Events Results (EMBER) dataset consists of 36-km grid-spacing Community Multiscale Air Quality (CMAQ) photochemical modeling for the summer of 2023. For emissions, these simulations utilized representative monthly and day-of-week anthropogenic emissions from a recent year and preliminary day-specific 2023 fire emissions derived using BlueSky pipeline. The base model run simulated ozone concentrations across the contiguous US during Apr 11-Sep 29, 2023. Two zero-out model runs simulated ozone levels that would have occurred in the US (1) in the absence of fire emissions (“Zero Fires”) and (2) in the absence of only Canadian wildfire emissions (“Zero Canadian Fires”). Fire impacts on ozone were then estimated as the difference between ozone simulated in the base EMBER run compared to the ozone simulated in each of the zero out model runs. EMBER is presented as a screening level dataset due to the emissions limitations and the 36-km grid-spacing used in these simulations.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111208"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728960/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111286
Abdelkabir Bacha , Ramzi El Idrissi , Khalid Janati Idrissi , Fatima Lmai
{"title":"Comprehensive dataset for fault detection and diagnosis in inverter-driven permanent magnet synchronous motor systems","authors":"Abdelkabir Bacha , Ramzi El Idrissi , Khalid Janati Idrissi , Fatima Lmai","doi":"10.1016/j.dib.2025.111286","DOIUrl":"10.1016/j.dib.2025.111286","url":null,"abstract":"<div><div>This work introduces a new, comprehensive dataset for Fault Detection and Diagnosis (FDD) in inverter-driven Permanent Magnet Synchronous Motor (PMSM) systems. Despite the increasing significance of AI-driven FDD techniques, the domain suffers from a lack of publicly accessible, real-world datasets for algorithm development and evaluation. Our contribution fills this gap by offering a comprehensive, multi-sensor dataset obtained from a bespoke experimental apparatus. The dataset includes different fault cases, such as open-circuit faults, short-circuit faults, and overheating conditions in the inverter switches. The dataset incorporates 8 raw sensor measurements and 15 derived features, recorded at 10 Hz, amounting to 10,892 samples across 9 operational conditions (one normal, eight fault types). By keeping this dataset publicly accessible, we seek to accelerate research in AI-driven fault identification and diagnosis for electric drive systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111286"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143131143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111246
Daniel W. McKenney , John H. Pedlar , Kevin Lawrence , Stephen R. Sobie , Kaitlin DeBoer , Tiziana Brescacin
{"title":"Spatial datasets of CMIP6 climate change projections for Canada and the United States","authors":"Daniel W. McKenney , John H. Pedlar , Kevin Lawrence , Stephen R. Sobie , Kaitlin DeBoer , Tiziana Brescacin","doi":"10.1016/j.dib.2024.111246","DOIUrl":"10.1016/j.dib.2024.111246","url":null,"abstract":"<div><div>Geospatial climate change projections are critical for assessing climate change impacts and adaptations across a wide range of disciplines. Here we present monthly-based grids of climate change projections at a 2-km resolution covering Canada and the United States. These data products are based on outputs from the 6th Coupled Model Intercomparison Project (CMIP6) and include projections for 13 General Circulation Models (GCMs), three Shared Socio-economic Pathways (SSP1 2.6, SSP2 4.5, and SSP5 8.5), four 30-year time periods (2011–2040, 2021–2050, 2041–2070, and 2071–2100), and a suite of climate variables, including monthly maximum and minimum temperature, precipitation, climate moisture index, and various bioclimatic summaries. The products employ a delta downscaling method, which combines historical normal values at climate stations with broad-scale change projections (or deltas) from GCMs, followed by spatial interpolation using ANUSPLIN. Various quality control efforts, described herein, were undertaken to ensure that the final products provided reasonable estimates of future climate.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111246"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}