{"title":"Knowledge base dataset on nature-based solutions for wastewater treatment and stormwater management","authors":"Josep Pueyo-Ros , Massimiliano Riva , Gisela Gonzalvo-Henry , Sophie Guillaume-Ruty , Joaquim Comas","doi":"10.1016/j.dib.2025.111469","DOIUrl":"10.1016/j.dib.2025.111469","url":null,"abstract":"<div><div>We introduce a knowledge base on Nature-Based Solutions (NBS) for wastewater treatment and stormwater management. The knowledge base includes (i) a catalogue of solutions and their variables, such as type of water that they can manage, ecosystem services provided, and operational constraints; (ii) a dataset of scientific publications related to each of the solutions included in the catalogue; and, (iii) a dataset of monitoring samples collected on treatment performance of NBS for wastewater treatment, including biological oxygen demand, chemical oxygen demand, ammonia, nitrate, total nitrogen, phosphate and pathogens (Escherichia Coli and Helminth eggs). Data collection methods employed to build the knowledge base included elicitation workshops, expert assessments, literature review and experimental data. A notable inclusion is the database of scientific publications, boasting 513 entries and providing a dynamic reference point for solution insights. Users can actively contribute to the database, ensuring its continuous enrichment and relevance. Whereas the goal of the knowledge base is to feed the algorithms developed in the Nat4Wat tool, this data can be used for several purposes, from modelling surface requirements for different NBS to being used as a sandbox for data science students. All the data can be accessed directly using a REST API of the Nat4Wat tool, which downloads the last version of the different datasets; or can be downloaded from a static repository where versions are regularly updated.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111469"},"PeriodicalIF":1.0,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Electroencephalographic responses before, during, and after upper limb paired associative stimulation","authors":"Yumi Shikauchi , Kazumasa Uehara , Yuka O. Okazaki , Keiichi Kitajo","doi":"10.1016/j.dib.2025.111467","DOIUrl":"10.1016/j.dib.2025.111467","url":null,"abstract":"<div><div>Paired associative stimulation (PAS) is a non-invasive protocol involving repeated stimulus pairs to activate two cortical areas alternately, inducing Hebbian-like plasticity. However, its neurophysiological impacts remain unclear. To determine the changes that occur in the brain during PAS, brain activity during PAS must be measured and distinguished from the electromagnetic artifacts produced by the stimulation. Here, we present a novel dataset of electroencephalography (EEG) measurements during PAS with an inter-stimulus-interval of 25 ms (PAS<sub>25</sub>, expected to induce long-term potentiation-like changes) or 35 ms (PAS<sub>35</sub>, no expected change). This dataset includes raw data and pre-processed data with electromagnetic artefacts removed. The right ulnar nerve's electrical stimulation preceded transcranial magnetic stimulation to the left primary motor cortex in both cases. EEG was measured before and after the PAS sessions, with only electrical or magnetic stimulation. To demonstrate the quality of the data, we summarize the stability of the stimulation site and the event-related potentials before, during, and after PAS. This dataset will enable observing brain dynamics due to the accumulation of stimulations during PAS and differences in responsiveness to stimulations before and after PAS.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111467"},"PeriodicalIF":1.0,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-14DOI: 10.1016/j.dib.2025.111465
I Gusti Agung Gede Arya Kadyanan, Ngurah Agus Sanjaya ER, Ida Bagus Gede Dwidasmara, I Kadek Dwi Adnyana, I Komang Widia Pratama, Ngakan Made Alit Wiradhanta, I Agung Gede Ary Mahayasa
{"title":"Balinese traditional building architecture dataset","authors":"I Gusti Agung Gede Arya Kadyanan, Ngurah Agus Sanjaya ER, Ida Bagus Gede Dwidasmara, I Kadek Dwi Adnyana, I Komang Widia Pratama, Ngakan Made Alit Wiradhanta, I Agung Gede Ary Mahayasa","doi":"10.1016/j.dib.2025.111465","DOIUrl":"10.1016/j.dib.2025.111465","url":null,"abstract":"<div><div>This study introduces a dataset focused on Balinese traditional architecture, comprising two main categories, residential and sacred buildings. The dataset includes 14 types of residential buildings with 68 images and 29 types of sacred buildings with 76 images. Each design is represented through 2D drawings, such as front, side, section views, and floor plans, created using AutoCAD software. These drawings are meticulously derived from traditional Balinese architectural texts to ensure alignment with cultural norms. The completed designs were exported into JPG format using Figma, with accompanying metadata that includes size and structural details to enhance usability. This dataset aims to support researchers, cultural preservationists, and educators in exploring and preserving Balinese architectural heritage.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111465"},"PeriodicalIF":1.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-14DOI: 10.1016/j.dib.2025.111457
Marta Fernández-Olmos , Jorge Fleta-Asín , Talía Gómez-Aguas , Fernando Muñoz , Carlos Sáenz-Royo
{"title":"Improved database of public-private partnerships from World Bank with imputed economic, institutional and conflict data.","authors":"Marta Fernández-Olmos , Jorge Fleta-Asín , Talía Gómez-Aguas , Fernando Muñoz , Carlos Sáenz-Royo","doi":"10.1016/j.dib.2025.111457","DOIUrl":"10.1016/j.dib.2025.111457","url":null,"abstract":"<div><div>The World Bank's database on private participation in infrastructure (PPI) projects provides detailed information on these initiatives. However, the original dataset includes imputed macro-level data for the countries that is outdated, lacks assigned ISO country codes, and is not linked to other standard country-level variables necessary for proper analysis and control by territory. In the improved version of the database, 10,958 project observations from 1900 to 2021 have been supplemented with ISO 2 and 3 country codes, enabling accurate integration with other databases. Additionally, 49 new variables related to economic, institutional, and conflict data are incorporated by country and year. This enhanced database ensures that researchers can retain critical World Bank information that might otherwise be lost in future updates, as it is not always preserved in repositories.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111457"},"PeriodicalIF":1.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143684410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-13DOI: 10.1016/j.dib.2025.111460
Mavis Sarah Gyimah, James Benjamin Hayfron -Acquah, Rose-Mary Mensah Gyening, Michael Asante, Umar Farouk Ibn Abdulrahman, Evans Kotei
{"title":"AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual context","authors":"Mavis Sarah Gyimah, James Benjamin Hayfron -Acquah, Rose-Mary Mensah Gyening, Michael Asante, Umar Farouk Ibn Abdulrahman, Evans Kotei","doi":"10.1016/j.dib.2025.111460","DOIUrl":"10.1016/j.dib.2025.111460","url":null,"abstract":"<div><div>Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111460"},"PeriodicalIF":1.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-13DOI: 10.1016/j.dib.2025.111463
Ryan Paulik , Shaun Williams , Misaeli Funaki , Richard Turner
{"title":"Wind damage dataset for buildings from 2016 tropical cyclone Winston in Fiji","authors":"Ryan Paulik , Shaun Williams , Misaeli Funaki , Richard Turner","doi":"10.1016/j.dib.2025.111463","DOIUrl":"10.1016/j.dib.2025.111463","url":null,"abstract":"<div><div>Extreme winds caused by tropical cyclones offer a unique opportunity to evaluate physical damage to building structures. On 20 February 2016, Category 5 Tropical Cyclone Winston (TC Winston) made landfall in Fiji, causing damage to over 30, 000 buildings. This article presents an empirical wind building damage dataset for Fiji collected from onsite damage assessments in the TC Winston aftermath. The dataset represents over 700 building-specific records of hazard, building and damage variables recorded during a four-day survey in March 2016. Physical damage to building structures, contents, stock, equipment and plant are presented, along with disruption to residential building habitability and non-residential building services. The dataset provides a valuable record of building damage caused by TC Winston extreme winds that can be used with numerical wind model hazard intensity outputs to formulate building-specific wind vulnerability models for damage prediction in future extreme wind events.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111463"},"PeriodicalIF":1.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143684413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-13DOI: 10.1016/j.dib.2025.111455
Yasser N. Aldeoes , Pratibha Mahajan , Shilpa Y. Sondkar , Jitendra A. Gaikwad
{"title":"Rolling-element bearing vibration datasets under varying loads and speeds: A study from Vishwakarma Institute of Technology","authors":"Yasser N. Aldeoes , Pratibha Mahajan , Shilpa Y. Sondkar , Jitendra A. Gaikwad","doi":"10.1016/j.dib.2025.111455","DOIUrl":"10.1016/j.dib.2025.111455","url":null,"abstract":"<div><div>Data collection and analysis are critical for identifying and diagnosing issues in rolling-element bearings. Vishwakarma Technologies, Pune, India, has developed a unique rolling-element vibration dataset specifically gathered under controlled static load and motion conditions, adding significant value to existing public datasets. This dataset offers researchers precise vibration data that complements existing features, enabling accurate assessments of bearing conditions. Collected using accelerometers, the dataset also provides insights into bearing deterioration under sustained loads, which can help predict failures and support the development of advanced diagnostic tools. The dataset comprises 50 files that cover a wide range of operating and fault conditions, including varying motor speeds, high-loading scenarios, and both healthy and faulty bearing states. It delivers detailed, high-quality information that enhances the detection and diagnosis of rolling-element bearing problems, contributing to more reliable maintenance practices and improved system reliability.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111455"},"PeriodicalIF":1.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-10DOI: 10.1016/j.dib.2025.111454
I Made Agus Wirawan , Ketut Paramarta
{"title":"Acquisition Of Balinese Imagined Spelling using Electroencephalogram (BISE) Dataset","authors":"I Made Agus Wirawan , Ketut Paramarta","doi":"10.1016/j.dib.2025.111454","DOIUrl":"10.1016/j.dib.2025.111454","url":null,"abstract":"<div><div>One of the main goals of today's technology is to create a connected environment between humans and technological devices to perform daily physical activities. However, users with speech disorders cannot use this application. Loss of verbal communication can be caused by injuries and neurodegenerative diseases that affect motor production, speech articulation, and language comprehension. To overcome this problem, Brain-Computer Interfaces (BCI) use EEG signals as assistive technology to provide a new communication channel for individuals who cannot communicate due to loss of motor control. Of the several BCI studies that use EEG signals, no studies have studied Balinese characters. As a first step, this study examines the acquisition of EEG signal data for Balinese character recognition. There are several stages in obtaining EEG signal data for Balinese character spelling imagination in this study: preparation of research documents, preparation of stimulus media, submission of ethical permits, determination of participants, recording process, data presentation, and publication of datasets. The result datasets from this study are in the form of raw data, and data was analyzed for 18 Balinese and 6 vowel characters, both spelling and imagined.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111454"},"PeriodicalIF":1.0,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-09DOI: 10.1016/j.dib.2025.111456
Andreas Hallberg
{"title":"An 81-million-word multi-genre corpus of Arabic books","authors":"Andreas Hallberg","doi":"10.1016/j.dib.2025.111456","DOIUrl":"10.1016/j.dib.2025.111456","url":null,"abstract":"<div><div>This article describes The Arabic E-Book Corpus, a freely available Arabic corpus consisting of 1,745 books (81,5 million words) published by the Hindawi Foundation between 2008 and 2024. The books are of various genres, including fiction and non-fiction, children's literature, plays, and poetry. Most of the texts are editions of works originally published in the 20th century, but the corpus also includes editions of older historical works. Books were retrieved in epub format and converted to plain text and html. Only books published under unrestricted licenses are included. Extensive metadata (were collected from colophons and the publisher's website title, author, genre, publication date, original publication date, original language, etc.). The corpus was originally collected in order to investigate variation in the use of vowel diacritics across genres, but it is also suitable for other linguistic inquiries, especially as relating to genre, and as a source of texts published under free licenses for training language models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111456"},"PeriodicalIF":1.0,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143684412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data in BriefPub Date : 2025-03-07DOI: 10.1016/j.dib.2025.111452
Nadira Mokarroma , Md Romij Uddin , Imrul Mosaddek Ahmed , AHM Motiur Rahman Talukder , Abul Fazal Mohammad Shamim Ahsan , Zakaria Alam
{"title":"Multivariate analysis for identifying drought-tolerant barley (Hordeum vulgare L.) genotypes using stress indices","authors":"Nadira Mokarroma , Md Romij Uddin , Imrul Mosaddek Ahmed , AHM Motiur Rahman Talukder , Abul Fazal Mohammad Shamim Ahsan , Zakaria Alam","doi":"10.1016/j.dib.2025.111452","DOIUrl":"10.1016/j.dib.2025.111452","url":null,"abstract":"<div><div>The stress indicator widely expresses the impression of drought stress on barley genotypes throughout the crown root initiation period, highlighting its worldwide effect on production. So, it was urgent need to identify the drought-resilient genotypes considering the multi-trait genotype ideotype distance index (MGIDI). The study, conducted to evaluate the grain output and genetic variations of 50 barley genotypes using stress indicator. Under optimal conditions, genotype IBON14 achieved the highest grain yield of 9.55 g/plant, while under drought stress, genotype BD7194 produced 7.31 g/plant. Significant correlations, both positive and negative (ranging from 0.66 to 1.00), were observed among stress tolerance indices and yields. Using the MGIDI index, genotype BD7194 was selected as the most drought-tolerant, followed by BD7188, IBON14, BD8579, and IBON16, with a 5 % selection intensity. Factor study among the MGIDI revealed diverse tolerance and susceptibility indices, emphasizing the robustness of the certain genotypes, all of which were grouped under a single factor. The selected genotypes exhibited a selection gain (%) between 39.9 % and 113 %. Moreover, the selection differential, calculated from predicted values, varied from 0.25 to 2.54, and broad-sense heritability was determined to be ≥0.99. This study emphasizes the usefulness of the MGIDI index in selecting drought-tolerant barley genotypes, with BD7194 proving to be the most resilient, exhibiting high genetic stability and selection gains.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"59 ","pages":"Article 111452"},"PeriodicalIF":1.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}