Data in Brief最新文献

筛选
英文 中文
Electron microscopy data on irradiation effects in glassy carbon, nuclear graphite, pyrolytic carbon, and carbon fibers 玻璃碳、核石墨、热解碳和碳纤维辐照效应的电镜数据
IF 1.4
Data in Brief Pub Date : 2025-07-24 DOI: 10.1016/j.dib.2025.111918
J. David Arregui-Mena , Takaaki Koyanagi , David A. Cullen , Michael J. Zachman , Yan-Ru Lin , Kyle Everett , Sabrina Gonzalez-Calzada , Phillip D. Edmondson , Tyler J. Gerczak , Yutai Katoh , Nidia C. Gallego
{"title":"Electron microscopy data on irradiation effects in glassy carbon, nuclear graphite, pyrolytic carbon, and carbon fibers","authors":"J. David Arregui-Mena ,&nbsp;Takaaki Koyanagi ,&nbsp;David A. Cullen ,&nbsp;Michael J. Zachman ,&nbsp;Yan-Ru Lin ,&nbsp;Kyle Everett ,&nbsp;Sabrina Gonzalez-Calzada ,&nbsp;Phillip D. Edmondson ,&nbsp;Tyler J. Gerczak ,&nbsp;Yutai Katoh ,&nbsp;Nidia C. Gallego","doi":"10.1016/j.dib.2025.111918","DOIUrl":"10.1016/j.dib.2025.111918","url":null,"abstract":"<div><div>Glassy carbon, a monoatomic allotrope of carbon, is a candidate material for components in fission nuclear power systems due to its radiation tolerance. This article presents comprehensive electron microscopy data revealing the effects of neutron and electron irradiation on glassy carbon. For comparison, additional data are provided for pyrolytic graphite and carbon fibers, materials that exhibit similar structural behavior under irradiation. <em>In situ</em> electron irradiation experiments further illustrate the real-time microstructural evolution of glassy carbon during exposure. The dataset is organized into five parts: (1) transmission electron microscopy (TEM) micrographs of as-received and neutron-irradiated glassy carbon; (2) TEM micrographs of neutron-irradiated graphite; (3) TEM micrographs of unirradiated and irradiated carbon–carbon composites; (4) TEM micrographs of pyrolytic carbon specimens in both conditions; (5) scanning transmission electron microscopy (STEM) micrographs of as-received and neutron-irradiated glassy carbon and (6) <em>in situ</em> electron irradiation data of a glassy carbon particle. These datasets provide valuable insights into radiation-induced structural changes in carbon-based materials relevant to nuclear applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111918"},"PeriodicalIF":1.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLibrasBD: A Brazilian Portuguese–Brazilian sign language (Libras) bilingual text dataset designed to support neural machine translation VLibrasBD:一个巴西葡萄牙-巴西手语(Libras)双语文本数据集,旨在支持神经机器翻译
IF 1.4
Data in Brief Pub Date : 2025-07-24 DOI: 10.1016/j.dib.2025.111911
Manuella Aschoff Lima, Daniel Cruz, Diego Ramon Silva, Dilainne Daniel Albuquerque, Daniel Faustino Lacerda, Rostand Costa, Guido Lemos de Souza Filho, Tiago Maritan de Araújo
{"title":"VLibrasBD: A Brazilian Portuguese–Brazilian sign language (Libras) bilingual text dataset designed to support neural machine translation","authors":"Manuella Aschoff Lima,&nbsp;Daniel Cruz,&nbsp;Diego Ramon Silva,&nbsp;Dilainne Daniel Albuquerque,&nbsp;Daniel Faustino Lacerda,&nbsp;Rostand Costa,&nbsp;Guido Lemos de Souza Filho,&nbsp;Tiago Maritan de Araújo","doi":"10.1016/j.dib.2025.111911","DOIUrl":"10.1016/j.dib.2025.111911","url":null,"abstract":"<div><div>VLibras-DB is a bilingual text corpus in Brazilian Portuguese (BP) and Brazilian Sign Language (Libras), designed and developed to support the creation of machine translation systems from BP-to-Libras. The corpus adopts a textual notation for Libras known as gloss, which serves as an interlingua between the source and target languages. To support this process, we initially defined a set of grammatical rules specific to Libras. Based on this notation, a bilingual textual database was built by a team of ten Libras interpreters, resulting in a corpus comprising 127,349 BP–Libras translation pairs. The dataset includes approximately 72,000 general-purpose sentences and around 55,000 sentences extracted from Brazilian federal government content and services.. The dataset was carefully constructed to include a wide variety of lexical and syntactic phenomena relevant to Libras translation, such as directional verbs, intensifiers, negation, and word-sense disambiguation. The resulting resource provides not only a substantial volume of parallel data but also a linguistically informed foundation for training and evaluating NMT models, contributing significantly to the advancement of accessible language technologies for the Deaf community. This comprehensive dataset is particularly significant for Neural Machine Translation (NMT) as it provides a much-needed, high-quality resource to train and evaluate NMT models for this low-resource language pair, facilitating advancements in BP-to-Libras translation systems. Beyond its direct application in NMT, VLibrasBD serves as a foundational linguistic resource for natural language processing, supporting tasks such as comparative linguistic analysis, bilingual embedding training, and the development of assistive technologies to enhance multilingual communication and information accessibility.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111911"},"PeriodicalIF":1.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing the understanding, measurement and monitoring of healthy ageing: A systematically derived framework and dataset mapping associated constructs, measures and measurement approaches 促进对健康老龄化的理解、测量和监测:一个系统衍生的框架和数据集映射相关结构、测量和测量方法
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111916
Andreea Alexandra Piriu , Maria Vittoria Bufali , Giulia Cappellaro , Amelia Compagni , Aleksandra Torbica
{"title":"Advancing the understanding, measurement and monitoring of healthy ageing: A systematically derived framework and dataset mapping associated constructs, measures and measurement approaches","authors":"Andreea Alexandra Piriu ,&nbsp;Maria Vittoria Bufali ,&nbsp;Giulia Cappellaro ,&nbsp;Amelia Compagni ,&nbsp;Aleksandra Torbica","doi":"10.1016/j.dib.2025.111916","DOIUrl":"10.1016/j.dib.2025.111916","url":null,"abstract":"&lt;div&gt;&lt;div&gt;Healthy ageing is a multidimensional process shaped by physical, mental, social and environmental factors across the life course. However, the lack of a standardised framework and inconsistent interpretations of key constructs hinder meaningful comparison across studies and contexts. This article presents a systematically derived framework and structured dataset that capture the constructs, measurement approaches and methodological advancements involved in operationalising healthy ageing.&lt;/div&gt;&lt;div&gt;The framework and dataset originate from the systematic review conducted by Piriu et al. (2025), which comprehensively maps the range of concepts, constructs and operational dimensions – measures, metrics, instruments and scales – used in 55 empirical studies that operationalise healthy ageing, each addressing at least two of the three core domains identified by the World Health Organization: intrinsic capacity (IC), functional ability (FA) and the environment (ENV). The Piriu et al. (2025) framework (hereafter PIETHA) introduces a multilayered categorisation of the healthy ageing construct, articulating it into three domains (IC, FA, ENV), 15 sub-domains, and 84 themes. This structure reflects both conceptual and measurement considerations (e.g. subjective vs. objective categories, self-reported vs. assessed/tested) while also providing a novel, detailed organisation of the environmental factors shaping healthy ageing at different levels of analysis (micro, meso and macro). Reflecting this multidimensional structure, the dataset complements the framework by documenting the specific tools employed in healthy ageing operationalisation, including assessment scales, validated instruments and measurement methodologies.&lt;/div&gt;&lt;div&gt;By systematically analysing how healthy ageing is operationalised across disciplines, the PIETHA framework and related dataset support the identification of conceptual, empirical and methodological gaps in healthy ageing research. This enables researchers to generate new hypotheses, explore underrepresented areas – such as environmental and psychosocial dimensions – and advance integrative metrics. The framework and dataset also enhance methodological transparency and data reuse due to a structured design anchored in thematic identification, construct classification and comprehensive mapping of measurement practices. This facilitates replication, comparison and innovation in the field by enabling the systematic evaluation of interconnected factors and measurement strategies across studies and contexts.&lt;/div&gt;&lt;div&gt;The framework and dataset offer a foundational resource to support the harmonisation of healthy ageing metrics and research strategies. They allow researchers and practitioners to: (i) obtain a structured overview of the concepts, constructs and measurement approaches used in healthy ageing research; (ii) assess the strengths and limitations of existing frameworks and methods, and compare approaches across domains","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111916"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144828718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A curated crowdsourced dataset of Luganda and Swahili speech for text-to-speech synthesis 为文本到语音的合成而精心策划的卢干达语和斯瓦希里语的众包数据集
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111915
Andrew Katumba , Sulaiman Kagumire , Joyce Nakatumba-Nabende , John Quinn , Sudi Murindanyi
{"title":"A curated crowdsourced dataset of Luganda and Swahili speech for text-to-speech synthesis","authors":"Andrew Katumba ,&nbsp;Sulaiman Kagumire ,&nbsp;Joyce Nakatumba-Nabende ,&nbsp;John Quinn ,&nbsp;Sudi Murindanyi","doi":"10.1016/j.dib.2025.111915","DOIUrl":"10.1016/j.dib.2025.111915","url":null,"abstract":"<div><div>This data article describes a curated, crowdsourced speech dataset in Luganda and Kiswahili, created to support text-to-speech (TTS) development in low-resource settings. The dataset is derived from Mozilla’s Common Voice corpus and includes only validated utterances from female speakers. A multi-step curation process was used to enhance the consistency and quality of the data. Speakers were first manually selected based on similarities in intonation, pitch, and rhythm, then validated using acoustic clustering with pitch features and mel-frequency cepstral coefficients (MFCCs). Audio files were preprocessed to remove leading and trailing silences using WebRTC voice activity detection, denoised with a causal waveform-based DEMUCS model, and filtered using WV-MOS, an automatic speech quality scoring tool. Only clips with a predicted MOS score of 3.5 or higher were retained. The final dataset contains over 19 h of Luganda and 15 h of Kiswahili recordings from six female speakers per language, each paired with a text transcription. This dataset is designed to support speech generation research in Luganda and Kiswahili and enable reproducible experimentation in end-to-end TTS systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111915"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tamarind health assessment dataset: Images of shelled, unshelled, and mixed tamarind pods 罗望子健康评估数据集:带壳、未带壳和混合罗望子豆荚的图像
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111917
Amol Bhosle , Deepali Godse , Sandip Thite , Kailas Patil , Touhid Bhuiyan
{"title":"Tamarind health assessment dataset: Images of shelled, unshelled, and mixed tamarind pods","authors":"Amol Bhosle ,&nbsp;Deepali Godse ,&nbsp;Sandip Thite ,&nbsp;Kailas Patil ,&nbsp;Touhid Bhuiyan","doi":"10.1016/j.dib.2025.111917","DOIUrl":"10.1016/j.dib.2025.111917","url":null,"abstract":"<div><div>This data paper provides image dataset that includes 8432 high-quality images of <em>Tamarindus indica</em> [1] (tamarind), categorized into six types: Shelled Healthy Single, Shelled Healthy Multiple, Unshelled Healthy Single, Unshelled Healthy Multiple, Shelled Unhealthy Single, and Shelled Unhealthy Multiple. The collection is intended primarily to assist agricultural research as well as machine learning applications for identifying and evaluating quality. There are differences in brightness and orientation in each category in the collection, which showcases a wide variety of images taken under controlled conditions. For accurate Tamarindus indica quality assessment, this dataset offers a useful resource for training and assessing computer vision models and machine learning techniques. Application in agriculture could be possible, enabling rapid, localized quality evaluation, with potential for broader industry adoption when adapted to other crops. In order to improve plant quality assessment methods and contribute to the creation of trustworthy automated systems for Tamarindus indica quality evaluation, we invite researchers to investigate this dataset and use creative thinking.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111917"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-resolution reanalysis biological ocean data from the Copernicus Marine Service Information for Philippine marine research 来自哥白尼海洋服务信息的高分辨率再分析海洋生物数据,用于菲律宾海洋研究
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111913
Brenna Mei M. Concolis
{"title":"High-resolution reanalysis biological ocean data from the Copernicus Marine Service Information for Philippine marine research","authors":"Brenna Mei M. Concolis","doi":"10.1016/j.dib.2025.111913","DOIUrl":"10.1016/j.dib.2025.111913","url":null,"abstract":"<div><div>In the absence of observation data, remotely sensed data provides an effective alternative in characterizing spatiotemporal dynamics and patterns of oceanographic data. Some of the most important variables are biomass estimates which describe the productivity of a certain area. Analyzing data with such indices is a useful tool to identify biological hotspots and shifts in concentrations that could be related to phenomenon and changes in the climate. As biomass patterns are crucial in the coastal areas, it is important to utilize data with high resolution at high frequencies (daily) to reduce the bias and capture significant changes in the coast. The E.U. Copernicus Marine Service Information provides reanalysis data of global biomass content that can be freely access by public users. However, problems accessing data could arise for users without prior knowledge of handling large data which is due to the high-resolution properties of the datasets. In addition, processing of large data can be challenging for users with technical hardware limitations. This dataset is provided to help Philippine marine researchers work with net primary productivity, micronekton, and zooplankton, even if they have technical limitations. Daily values, monthly and annual means, climatologies (daily, monthly, and long-term), and anomalies (daily, monthly, and annual) are provided in the public repository. The dataset will allow short-term and long-term analysis in the Philippine waters.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111913"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonastreda multimodal dataset for efficient tool wear state monitoring 用于有效监测刀具磨损状态的Nonastreda多模态数据集
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111905
Hubert Truchan , Zahra Ahmadi
{"title":"Nonastreda multimodal dataset for efficient tool wear state monitoring","authors":"Hubert Truchan ,&nbsp;Zahra Ahmadi","doi":"10.1016/j.dib.2025.111905","DOIUrl":"10.1016/j.dib.2025.111905","url":null,"abstract":"<div><div>With advancements in artificial intelligence (AI), there is a growing need to bridge the gap between multimodal learning capabilities and the availability of high-quality datasets for tool wear estimation. Industrial scenarios frequently require domain-specific knowledge, specialized datasets, and efficient deployment on resource-constrained edge devices that demand minimal memory, low latency, and optimized computational performance. While there has been a shift from unimodal sensor-based approaches to multisensory, multimodal strategies, this transition remains in its early stages. Developing feature extraction methods, multimodal fusion techniques, and correlation analysis frameworks is crucial for improving tool wear prediction models.</div><div>Existing multimodal open-source datasets have several limitations in addressing these challenges:<ul><li><span>•</span><span><div>They are often restricted to a specific set of data modalities, limiting adaptability.</div></span></li><li><span>•</span><span><div>They primarily feature general-purpose objects, which are not well-suited for industrial applications requiring specialized domain knowledge.</div></span></li><li><span>•</span><span><div>They lack support for lightweight models designed for real-time processing on edge devices.</div></span></li><li><span>•</span><span><div>They lack in-depth documentation or dedicated data loaders, limiting reproducibility.</div></span></li></ul></div><div>To bridge this gap, we introduce the <strong>Nonastreda Multimodal Dataset</strong> for efficient tool wear state monitoring. The dataset models the multimodal nature of tool wear progression in industrial milling processes, integrating nine data modalities. It comprises 512 samples, each containing RGB images of the shaft milling tool, workpiece, and material chip, along with three scalograms and three spectrograms derived from force signals. Data collection was performed using ten milling tools in an industrial production environment.</div><div>The dataset is designed to support classification tasks (sharp, used, dulled) and regression tasks predicting three target variables: flank wear (µm), gaps (µm), and overhang (µm). Each sample can be analyzed independently or as part of a temporally correlated sequence.</div><div>Accompanying scripts for data processing and analysis are available in the repository.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111905"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HER2-IHC-40x: A high-resolution histopathology dataset for HER2 IHC scoring in breast cancer HER2-IHC-40x:乳腺癌中HER2 IHC评分的高分辨率组织病理学数据集
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111922
Md Serajun Nabi , Mohammad Faizal Ahmad Fauzi , Zaka Ur Rehman , Hezerul Bin Abdul Karim , Phaik-Leng Cheah , Seow-Fan Chiew , Lai-Meng Looi
{"title":"HER2-IHC-40x: A high-resolution histopathology dataset for HER2 IHC scoring in breast cancer","authors":"Md Serajun Nabi ,&nbsp;Mohammad Faizal Ahmad Fauzi ,&nbsp;Zaka Ur Rehman ,&nbsp;Hezerul Bin Abdul Karim ,&nbsp;Phaik-Leng Cheah ,&nbsp;Seow-Fan Chiew ,&nbsp;Lai-Meng Looi","doi":"10.1016/j.dib.2025.111922","DOIUrl":"10.1016/j.dib.2025.111922","url":null,"abstract":"<div><div>The HER2-IHC-40x and HER2-IHC-40x-WSI datasets are high-resolution whole slide image (WSI) and patch-extracted region collection for HER2 immunohistochemistry (IHC) scoring in breast cancer pathology. 107 WSIs are scanned at 40 × magnification with Regions of Interest (ROIs) annotated by expert pathologists. Patches of 1024 × 1024 pixels are extracted from the ROIs and classified into four HER2 scores (0, 1+, 2+, 3+), yielding structured data for computational pathology analysis. There were two strategies of splitting: WSI-based split, where data was first split before extracting the patches and named as HER2-IHC-40x for this dataset, the other one is patch-based split, where patches were extracted first and then split, named as HER2-IHC-40x-WSI of this dataset. The filtering method for color histograms was applied to remove the non-tumour regions and artifacts, generating high-quality data. The dataset is applicable to deep learning applications, including HER2 classification and explainable AI. It is freely available on Zenodo, with preprocessing scripts provided via GitHub, enabling reproducibility in digital pathology research.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111922"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spatial dataset on Ecuadorian cropping systems and theoretical crop residue potentials 厄瓜多尔种植制度和理论作物残留潜力的空间数据集
IF 1.4
Data in Brief Pub Date : 2025-07-23 DOI: 10.1016/j.dib.2025.111910
Christhel Andrade Díaz , Ezequiel Zamora-Ledezma , Lorie Hamelin
{"title":"A spatial dataset on Ecuadorian cropping systems and theoretical crop residue potentials","authors":"Christhel Andrade Díaz ,&nbsp;Ezequiel Zamora-Ledezma ,&nbsp;Lorie Hamelin","doi":"10.1016/j.dib.2025.111910","DOIUrl":"10.1016/j.dib.2025.111910","url":null,"abstract":"<div><div>This dataset provides a high-resolution, spatially explicit baseline of Ecuadorian cropping systems and associated pedoclimatic conditions to support long-term modeling of soil organic carbon (SOC) dynamics and biomass resource planning. The dataset is built from national sources, including Ecuadorian agricultural statistics and crop production surveys spanning 2002 – 2019. Ten dominant crops, representing over 90 % of the country’s cultivated area, are characterized across 23,021 Agricultural Pedoclimatic Units (APCUs), each defined by unique combinations of soil attributes, climate variables, and crop types. For each APCU, the dataset includes theoretical harvestable crop residue potentials, above- and belowground carbon inputs, and SOC-relevant parameters such as root depth distribution and biomass composition. Residue-to-product ratios (RPR), root-to-shoot biomass ratios (R:S), and biomass-to-carbon conversion coefficients were compiled through a comprehensive literature review and transparently documented. Additionally, the dataset includes monthly projections of average temperature, cumulative precipitation, and estimated evapotranspiration from 2020 to 2070 under the RCP4.5 climate scenario. Temperature and precipitation data were obtained from downscaled daily projections based on an ensemble of global climate models, and evapotranspiration was subsequently calculated using the Thornthwaite method. All variables were spatially assigned to each APCU. This open-access dataset is designed for reuse in soil carbon modeling frameworks, supports the design of biomass mobilization strategies, and informs climate-smart land-use strategies in tropical agricultural systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111910"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144722549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of psychological emotional expressions relating to depression, anxiety and stress for Malay language model training 马来语模型训练中与抑郁、焦虑和压力相关的心理情绪表达数据集
IF 1.4
Data in Brief Pub Date : 2025-07-22 DOI: 10.1016/j.dib.2025.111893
Ruhaila Maskat , Nor Hapiza Mohd Ariffin , Nurul Akhmal Dzulkefli
{"title":"A dataset of psychological emotional expressions relating to depression, anxiety and stress for Malay language model training","authors":"Ruhaila Maskat ,&nbsp;Nor Hapiza Mohd Ariffin ,&nbsp;Nurul Akhmal Dzulkefli","doi":"10.1016/j.dib.2025.111893","DOIUrl":"10.1016/j.dib.2025.111893","url":null,"abstract":"<div><div>The dataset is a collection of emotional expressions in the Malay language based on keywords established by the Malay DASS-42 psychological self-assessment survey. These keywords reflect the conditions of depression, anxiety and stress – known as subscales. Additional to the emotional expressions, the dataset is labelled with these subscales. The keyword determination was assisted by a psychiatrist. Further enrichment with synonyms was validated by a Malay linguist. Scraping was conducted from 2010 till 2019 using the Twitter API. The dataset can benefit research works in the areas of emotional speech recognition, emotional intelligence understanding and emotional prediction. The dataset consists of raw and pre-processed posts which include normalized and tokenized words suitable for training Malay Large Language Models and predictive analytics.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 111893"},"PeriodicalIF":1.4,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信