Data in BriefPub Date : 2025-08-01DOI: 10.1016/j.dib.2025.111863
Manon Pull , Lionel Alletto , Eric Justes , Jay Ram Lamichhane , Elana Dayoub , Guillaume Hustet-Caou , Emilie Sitnikow , Noémie Gaudio , Pierre Casadebaig , Rémi Mahmoud , Neïla Ait Kaci Ahmed , Julie Constantin , Antoine Couëdel , Noémie Deschamps , Eric Lecloux , Damien Marchand , François Perdrieux , Didier Raffaillac , Lucie Souques , Gilles Tison , Célia Seassau
{"title":"A comprehensive dataset gathering 37 cover crop field experiments across France (2004–2022): plant and soil-related agronomic variables on 33 crop species from five botanical families","authors":"Manon Pull , Lionel Alletto , Eric Justes , Jay Ram Lamichhane , Elana Dayoub , Guillaume Hustet-Caou , Emilie Sitnikow , Noémie Gaudio , Pierre Casadebaig , Rémi Mahmoud , Neïla Ait Kaci Ahmed , Julie Constantin , Antoine Couëdel , Noémie Deschamps , Eric Lecloux , Damien Marchand , François Perdrieux , Didier Raffaillac , Lucie Souques , Gilles Tison , Célia Seassau","doi":"10.1016/j.dib.2025.111863","DOIUrl":"10.1016/j.dib.2025.111863","url":null,"abstract":"<div><div>Cover crops, sown between cash crops to provide ecosystem services, contribute to sustainability but present challenges related to the use of environmental resources out of the main cropping season. This dataset gathers the results of 37 field experiments related to cover crops, whether grown in sole crops or in mixtures, conducted in France over 19 years. It compiles quantitative data from field measurements and laboratory analyses on 33 species among five botanical families, along with information on technical management and climate conditions, providing a valuable resource for assessing cover crops performance.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"61 ","pages":"Article 111863"},"PeriodicalIF":1.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144764041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-28DOI: 10.1016/j.dib.2025.111927
Maria Teresa Calcagni , Giovanni Salerno , Gloria Cosoli , Giuseppe Pandarese , Gian Marco Revel
{"title":"Construction and demolition waste material library based on vision systems data","authors":"Maria Teresa Calcagni , Giovanni Salerno , Gloria Cosoli , Giuseppe Pandarese , Gian Marco Revel","doi":"10.1016/j.dib.2025.111927","DOIUrl":"10.1016/j.dib.2025.111927","url":null,"abstract":"<div><div>The sustainable management of Construction and Demolition Wastes (CDWs) represents a crucial challenge for the European Union, considering that this wastes stream constitutes one of the main sources of man-made solid wastes. The implementation of strategies aimed at the recovery and recycling of these materials is essential to reduce the environmental impact of the construction sector and to foster the transition towards a circular economy model. However, one of the main obstacles for effective reuse and/or recycling of CDWs lies in the complexity of their composition, which includes a wide range of materials such as concrete, bricks, ceramics, metals, and wood, not rarely contaminated with harmful substances. In this context, this data article presents a comprehensive material library designed to collect, organise, and make available data from advanced material characterisation analyses based on vision systems data. Specifically, the library focuses on data obtained through two measurement techniques: infrared (IR) thermography and hyperspectral imaging (HSI). These methodologies were selected for their ability to provide complementary information on the chemical composition and physical properties of materials. The material library was developed as part of an in-depth study of CDW from building demolition and renovation operations in several EU countries. The data collection process included the preparation and analysis of representative samples, with the aim of ensuring maximum accuracy and reproducibility of the measurements. The data obtained were standardised and organised in a format compatible with the main statistical analysis and machine learning tools to facilitate their integration into predictive models and decision-making processes. The article describes in detail the library structure, data collection protocols, and practical applications in the fields of waste management and sustainable construction. In addition, the benefits of this resource for the scientific and industrial community are discussed, including the possibility of using the data to develop/fine-tune artificial intelligence (AI) algorithms capable of optimising sorting and recycling processes by recognition and discrimination among different types of CDW material using the aforementioned sensors. The material library represents a significant contribution to addressing the challenges posed by CDW management, promoting a more efficient use of resources and reducing the environmental impact of construction and demolition activities. This extensive database not only facilitates material characterisation and separation but also represents a solid basis for future technological innovation in the construction sector.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111927"},"PeriodicalIF":1.4,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-25DOI: 10.1016/j.dib.2025.111933
Ahmet Alperen Polat , Sinem Bozkurt Keser , İnci Sarıçiçek , Alim Kerem Erdoğmuş , Ali Kafalı , Ahmet Yazıcı
{"title":"A dataset for state of charge and range estimation of an L5 type electric vehicle that is used for Urban Logistic","authors":"Ahmet Alperen Polat , Sinem Bozkurt Keser , İnci Sarıçiçek , Alim Kerem Erdoğmuş , Ali Kafalı , Ahmet Yazıcı","doi":"10.1016/j.dib.2025.111933","DOIUrl":"10.1016/j.dib.2025.111933","url":null,"abstract":"<div><div>Light electric vehicles contribute to sustainable urban logistics. However, range anxiety is a significant problem in the use of electric vehicles. State of Charge and range estimation are of critical importance to reduce range anxiety. Although there are studies and datasets on the State of Charge and Range Estimation of passenger electric vehicles, the literature on L5 type electric vehicles is not yet mature enough. In this study, measurement data are obtained from an L5 class electric vehicle for urban cargo transportation under different driving dynamics and load conditions. The raw sensor data obtained via the Controller Area Network Bus in the vehicle are stored in 35 separate comma-seperated value files and recording frequencies of up to 800 per second are captured. Then, the cleaning process converts the raw data into one-second intervals. The data is recorded during test drives performed on a loop route of approximately two kilometres, under different slopes and environmental conditions, at speeds of 15, 25 and 35 km/h, both loaded and unloaded. The presented ESOGU-ML5EV dataset provides a reusable and organized infrastructure for those who want to analyse the energy consumption of electric vehicles used in urban logistics, examine the factors affecting consumption, conduct range estimation studies, or investigate electric vehicle routing problems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111933"},"PeriodicalIF":1.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data from SymSPAN and OSPAN working memory capacity tasks in online and laboratory settings","authors":"Michał Wereszczyński , Paulina Chwiłka , Ewa Smołka , Ewa Ilczuk , Sezin Öner , Krystian Barzykowski","doi":"10.1016/j.dib.2025.111923","DOIUrl":"10.1016/j.dib.2025.111923","url":null,"abstract":"<div><div>The present dataset comprises the performance of adult participants on two experimental tasks designed to measure working memory capacity: the Symmetry Span (SymSPAN) and Operation Span (OSPAN) tasks. Initially, a large sample of 566 participants completed these tasks online. From this pool, a random subset of individuals representing low, medium, and high levels of working memory capacity were invited to participate in two laboratory sessions. In these sessions, spaced one week apart, participants completed the same tasks again. The dataset includes complete performance data from both tasks, along with demographic information such as participants’ age and gender. This relatively large dataset offers valuable opportunities for exploratory research on working memory capacity, including analyses of its relative stability, variations over time and across testing environments, individual differences, and contributions to meta-analyses.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111923"},"PeriodicalIF":1.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Suprasubduction geochemical dataset of ultramafic minerals in Southern Iran: The Ab-Bid complex","authors":"Mahdieh Mohammadi , Hamid Ahmadipour , Abbas Moradian , Daniele Brunelli , Reza Derakhshani","doi":"10.1016/j.dib.2025.111919","DOIUrl":"10.1016/j.dib.2025.111919","url":null,"abstract":"<div><div>This article presents a curated dataset of major and trace element compositions from ultramafic minerals in the Ab-Bid complex, an ophiolitic massif within the Esfandagheh–Hadji Abad mélange zone in Southern Iran. Samples were collected from orthopyroxenite dykes and their host peridotites. Major element analyses were performed using wavelength-dispersive electron microprobe analysis (EPMA), and trace elements were measured via laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). The dataset comprises six structured Excel tables covering orthopyroxene, clinopyroxene, olivine, and spinel compositions, including rare earth and high field strength element distributions. Analytical metadata such as spot identifiers, standardization protocols, and operating conditions are included to ensure reproducibility. The data facilitate applications in melt–rock interaction modeling, mineral thermometry, and suprasubduction zone geochemical comparison. Researchers interested in mantle processes, subduction-related metasomatism, or petrological database development will find this dataset particularly valuable.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111919"},"PeriodicalIF":1.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-25DOI: 10.1016/j.dib.2025.111926
Ali Albada , Soo-Wah Low , Rabie A. Ramadan , Khalid Al Qatiti , Muataz Salam Al-Daweri
{"title":"Ex-ante determinants of IPOs: A dataset for the Malaysian IPOs","authors":"Ali Albada , Soo-Wah Low , Rabie A. Ramadan , Khalid Al Qatiti , Muataz Salam Al-Daweri","doi":"10.1016/j.dib.2025.111926","DOIUrl":"10.1016/j.dib.2025.111926","url":null,"abstract":"<div><div>This article introduces a comprehensive dataset for investigating the ex-ante determinants of Initial Public Offerings (IPOs) first-day initial returns, investor demand, and the information gap experienced by investors for listed firms on Bursa Malaysia over the period 2004–2021. The final sample comprises 350 IPOs priced using the fixed-price method, which were hand-collected from multiple sources. The data in this dataset plays an important role in providing insightful decision-making tips for prospective investors operating in an environment characterized by high information asymmetry. Furthermore, the dataset opens avenues for future research, such as cross-country studies exploring the impact of country-specific factors on IPO initial return outcomes.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111926"},"PeriodicalIF":1.4,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-24DOI: 10.1016/j.dib.2025.111911
Manuella Aschoff Lima, Daniel Cruz, Diego Ramon Silva, Dilainne Daniel Albuquerque, Daniel Faustino Lacerda, Rostand Costa, Guido Lemos de Souza Filho, Tiago Maritan de Araújo
{"title":"VLibrasBD: A Brazilian Portuguese–Brazilian sign language (Libras) bilingual text dataset designed to support neural machine translation","authors":"Manuella Aschoff Lima, Daniel Cruz, Diego Ramon Silva, Dilainne Daniel Albuquerque, Daniel Faustino Lacerda, Rostand Costa, Guido Lemos de Souza Filho, Tiago Maritan de Araújo","doi":"10.1016/j.dib.2025.111911","DOIUrl":"10.1016/j.dib.2025.111911","url":null,"abstract":"<div><div>VLibras-DB is a bilingual text corpus in Brazilian Portuguese (BP) and Brazilian Sign Language (Libras), designed and developed to support the creation of machine translation systems from BP-to-Libras. The corpus adopts a textual notation for Libras known as gloss, which serves as an interlingua between the source and target languages. To support this process, we initially defined a set of grammatical rules specific to Libras. Based on this notation, a bilingual textual database was built by a team of ten Libras interpreters, resulting in a corpus comprising 127,349 BP–Libras translation pairs. The dataset includes approximately 72,000 general-purpose sentences and around 55,000 sentences extracted from Brazilian federal government content and services.. The dataset was carefully constructed to include a wide variety of lexical and syntactic phenomena relevant to Libras translation, such as directional verbs, intensifiers, negation, and word-sense disambiguation. The resulting resource provides not only a substantial volume of parallel data but also a linguistically informed foundation for training and evaluating NMT models, contributing significantly to the advancement of accessible language technologies for the Deaf community. This comprehensive dataset is particularly significant for Neural Machine Translation (NMT) as it provides a much-needed, high-quality resource to train and evaluate NMT models for this low-resource language pair, facilitating advancements in BP-to-Libras translation systems. Beyond its direct application in NMT, VLibrasBD serves as a foundational linguistic resource for natural language processing, supporting tasks such as comparative linguistic analysis, bilingual embedding training, and the development of assistive technologies to enhance multilingual communication and information accessibility.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111911"},"PeriodicalIF":1.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-23DOI: 10.1016/j.dib.2025.111915
Andrew Katumba , Sulaiman Kagumire , Joyce Nakatumba-Nabende , John Quinn , Sudi Murindanyi
{"title":"A curated crowdsourced dataset of Luganda and Swahili speech for text-to-speech synthesis","authors":"Andrew Katumba , Sulaiman Kagumire , Joyce Nakatumba-Nabende , John Quinn , Sudi Murindanyi","doi":"10.1016/j.dib.2025.111915","DOIUrl":"10.1016/j.dib.2025.111915","url":null,"abstract":"<div><div>This data article describes a curated, crowdsourced speech dataset in Luganda and Kiswahili, created to support text-to-speech (TTS) development in low-resource settings. The dataset is derived from Mozilla’s Common Voice corpus and includes only validated utterances from female speakers. A multi-step curation process was used to enhance the consistency and quality of the data. Speakers were first manually selected based on similarities in intonation, pitch, and rhythm, then validated using acoustic clustering with pitch features and mel-frequency cepstral coefficients (MFCCs). Audio files were preprocessed to remove leading and trailing silences using WebRTC voice activity detection, denoised with a causal waveform-based DEMUCS model, and filtered using WV-MOS, an automatic speech quality scoring tool. Only clips with a predicted MOS score of 3.5 or higher were retained. The final dataset contains over 19 h of Luganda and 15 h of Kiswahili recordings from six female speakers per language, each paired with a text transcription. This dataset is designed to support speech generation research in Luganda and Kiswahili and enable reproducible experimentation in end-to-end TTS systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111915"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tamarind health assessment dataset: Images of shelled, unshelled, and mixed tamarind pods","authors":"Amol Bhosle , Deepali Godse , Sandip Thite , Kailas Patil , Touhid Bhuiyan","doi":"10.1016/j.dib.2025.111917","DOIUrl":"10.1016/j.dib.2025.111917","url":null,"abstract":"<div><div>This data paper provides image dataset that includes 8432 high-quality images of <em>Tamarindus indica</em> [1] (tamarind), categorized into six types: Shelled Healthy Single, Shelled Healthy Multiple, Unshelled Healthy Single, Unshelled Healthy Multiple, Shelled Unhealthy Single, and Shelled Unhealthy Multiple. The collection is intended primarily to assist agricultural research as well as machine learning applications for identifying and evaluating quality. There are differences in brightness and orientation in each category in the collection, which showcases a wide variety of images taken under controlled conditions. For accurate Tamarindus indica quality assessment, this dataset offers a useful resource for training and assessing computer vision models and machine learning techniques. Application in agriculture could be possible, enabling rapid, localized quality evaluation, with potential for broader industry adoption when adapted to other crops. In order to improve plant quality assessment methods and contribute to the creation of trustworthy automated systems for Tamarindus indica quality evaluation, we invite researchers to investigate this dataset and use creative thinking.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111917"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-07-23DOI: 10.1016/j.dib.2025.111913
Brenna Mei M. Concolis
{"title":"High-resolution reanalysis biological ocean data from the Copernicus Marine Service Information for Philippine marine research","authors":"Brenna Mei M. Concolis","doi":"10.1016/j.dib.2025.111913","DOIUrl":"10.1016/j.dib.2025.111913","url":null,"abstract":"<div><div>In the absence of observation data, remotely sensed data provides an effective alternative in characterizing spatiotemporal dynamics and patterns of oceanographic data. Some of the most important variables are biomass estimates which describe the productivity of a certain area. Analyzing data with such indices is a useful tool to identify biological hotspots and shifts in concentrations that could be related to phenomenon and changes in the climate. As biomass patterns are crucial in the coastal areas, it is important to utilize data with high resolution at high frequencies (daily) to reduce the bias and capture significant changes in the coast. The E.U. Copernicus Marine Service Information provides reanalysis data of global biomass content that can be freely access by public users. However, problems accessing data could arise for users without prior knowledge of handling large data which is due to the high-resolution properties of the datasets. In addition, processing of large data can be challenging for users with technical hardware limitations. This dataset is provided to help Philippine marine researchers work with net primary productivity, micronekton, and zooplankton, even if they have technical limitations. Daily values, monthly and annual means, climatologies (daily, monthly, and long-term), and anomalies (daily, monthly, and annual) are provided in the public repository. The dataset will allow short-term and long-term analysis in the Philippine waters.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111913"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}