Data in Brief最新文献_第5页

Urban mobility insights: A dataset for exploring network topology and city dynamics 城市移动洞察：用于探索网络拓扑和城市动态的数据集

IF 1.4

Data in Brief Pub Date : 2025-09-18 DOI: 10.1016/j.dib.2025.112076

D.D. Herrera-Acevedo , D. Sierra-Porta

{"title":"Urban mobility insights: A dataset for exploring network topology and city dynamics","authors":"D.D. Herrera-Acevedo , D. Sierra-Porta","doi":"10.1016/j.dib.2025.112076","DOIUrl":"10.1016/j.dib.2025.112076","url":null,"abstract":"<div><div>This article presents a comprehensive dataset capturing the urban network structures and sociodemographic variables of 65 cities worldwide for the year 2023, based on the Urban Mobility Readiness Index (UMRi) developed by the Oliver Wyman Forum. The dataset includes key metrics such as graph entropy, node degree, clustering coefficient, graph diameter, GDP per capita, and population density, among others, which are essential for analysing the relationship between network topology and urban mobility readiness. By offering detailed insights into these urban networks, this dataset serves as a valuable resource for cities not currently included in major mobility rankings, allowing them to evaluate their mobility readiness in relation to established indices like the UMRi. Urban planners and researchers can leverage this data to explore complex urban mobility dynamics and develop strategies to enhance transportation systems, particularly in rapidly growing or underserved regions. The dataset is structured for seamless integration with various analytical tools, making it a vital asset for both urban planning and research aimed at fostering sustainable and efficient urban development.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112076"},"PeriodicalIF":1.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spanish is not just one: A dataset of Spanish dialect recognition for LLMs 西班牙语不仅仅是一种：法学硕士西班牙方言识别数据集

IF 1.4

Data in Brief Pub Date : 2025-09-18 DOI: 10.1016/j.dib.2025.112088

Gonzalo Martínez , Marina Mayor-Rocher , Cris Pozo Huertas , Nina Melero , María Grandury , Pedro Reviriego

{"title":"Spanish is not just one: A dataset of Spanish dialect recognition for LLMs","authors":"Gonzalo Martínez , Marina Mayor-Rocher , Cris Pozo Huertas , Nina Melero , María Grandury , Pedro Reviriego","doi":"10.1016/j.dib.2025.112088","DOIUrl":"10.1016/j.dib.2025.112088","url":null,"abstract":"<div><div>This paper presents a dataset designed to assess the capability of Large Language Models (LLMs) in handling different Spanish dialects. While multilingualism is widely recognized as a crucial aspect of NLP, dialectal evaluation remains largely unexplored. Spanish, spoken by over 600 million people, exhibits significant lexical, morphological, and syntactic variation across regions. Recognizing these linguistic and cultural differences is essential for preserving smaller dialects, preventing their marginalization, and ensuring that Spanish is not reduced to a monolithic language. To address this gap, we introduce a dataset specifically designed to analyze whether LLMs can accurately identify different Spanish varieties while also measuring their potential preference for specific dialects. The dataset consists of 30 carefully crafted multiple-choice questions, requiring models to select the most appropriate option from different regional variations. Each question has been meticulously developed and reviewed by linguistic experts, undergoing multiple refinement cycles to ensure linguistic accuracy and effectiveness in detecting dialectal biases. This dataset represents an important step toward developing more inclusive and fair evaluation frameworks for Spanish Natural Language Processing (NLP). By identifying potential biases in LLMs and analyzing their ability to adapt to regional linguistic variations, this work contributes to the broader goal of equitable language representation in AI-driven text generation and comprehension tasks.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112088"},"PeriodicalIF":1.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145128353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Raw and pre-processed cruise passengers' GPS tracking datasets 原始和预处理的邮轮乘客GPS跟踪数据集

IF 1.4

Data in Brief Pub Date : 2025-09-18 DOI: 10.1016/j.dib.2025.112078

Mauro Ferrante , Andrea Perri , Stefano De Cantis , Amit Birenboim , Noam Shoval

{"title":"Raw and pre-processed cruise passengers' GPS tracking datasets","authors":"Mauro Ferrante , Andrea Perri , Stefano De Cantis , Amit Birenboim , Noam Shoval","doi":"10.1016/j.dib.2025.112078","DOIUrl":"10.1016/j.dib.2025.112078","url":null,"abstract":"<div><div>The Global Positioning System (GPS) enables the precise collection of spatio-temporal data in real time, significantly enhancing our understanding of human mobility. GPS tracking data are spatially and temporally precise and can be supplemented using other sources. In ``Cruise passengers' behavior at the destination: Investigation using GPS technology'' by De Cantis et al. (2016) the application of this technology is exemplified to gather insightful information on spatio-temporal behaviour of cruise passengers at their destination. The study was the first to use GPS technology for analyzing the cruise tourism segment, setting a precedent in the field. Selected cruise ship passengers participated in a survey by completing initial and final questionnaires and carrying GPS data loggers during their visit. These loggers recorded geographic coordinates (latitude and longitude) along with timestamps at about ten-second intervals.</div><div>The passengers were selected using a pseudo-systematic sampling strategy, where about one out of every twenty passengers were sampled during the specified survey period. Beyond simply presenting the raw GPS data, this article also offers pre-processed data. A specially designed algorithm was employed to eliminate outliers and noise points and to impute missing values by mean of dynamic moving medians. This algorithm detects and imputes noise points in GPS data by considering both temporal and spatial distances, effectively identifying abnormal observations caused by equipment failures or environmental interference. Its efficacy was demonstrated through tests conducted with data on cruise passengers’ behavior in Palermo city (Italy).</div><div>Despite these advancements, processing GPS data for the study of tourism phenomena remains challenging. There are numerous potential metrics derivable from such data, which are crucial for understanding tourist behavior at destinations. Making these data freely available represents a significant contribution to the collaborative development of pre-processing methodologies and GPS data analysis techniques for the analysis of tourist behavior, and of human mobility in general.</div><div>The dataset holds high value for comprehending human mobility patterns and can be applied across various fields, including urban planning, transportation management, and tourism research. By systematically sampling and recording geographic coordinates along with timestamps, the dataset provides a robust foundation for the analysis of tourist mobility.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112078"},"PeriodicalIF":1.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145156933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FAIRness and data quality assessment of urban air quality monitoring datasets: Perspective on insights from F-UJI evaluation. 城市空气质量监测数据集的公平性和数据质量评估：从F-UJI评估的见解看

IF 1.4

Data in Brief Pub Date : 2025-09-18 eCollection Date: 2025-10-01 DOI: 10.1016/j.dib.2025.112071

M S B Syed, Paula Kelly, Paul Stacey, Damon Berry

{"title":"FAIRness and data quality assessment of urban air quality monitoring datasets: Perspective on insights from F-UJI evaluation.","authors":"M S B Syed, Paula Kelly, Paul Stacey, Damon Berry","doi":"10.1016/j.dib.2025.112071","DOIUrl":"10.1016/j.dib.2025.112071","url":null,"abstract":"<p><p>Advancements in information technology have supported the open availability of environmental monitoring datasets to aid global initiatives such as the United Nations Sustainable Development Goals (UN SDGs). Despite these efforts, challenges concerning data quality and adherence to FAIR (Findable, Accessible, Interoperable, Reusable) principles continue to restrict the effective reuse of such datasets, particularly for secondary applications. This study uses the F-UJI assessment tool and a set of eight established DQ dimensions to evaluate the FAIRness and Data Quality (DQ) of four publicly available urban air quality monitoring datasets from international agencies. Each dataset was assessed against 17 FAIR metrics and scored accordingly. The FAIR assessments revealed moderate to low levels of compliance across datasets, with Reusable scores ranging from 2 to 3 out of 10, and Interoperability often being the weakest dimension. DQ analysis showed recurring issues in consistency, completeness, interpretability, and traceability, particularly where metadata was poorly structured or lacked semantic depth. While the scope is limited to four datasets, the results highlight common structural and semantic deficiencies hindering data reuse. Based on these findings, the study offers targeted recommendations to support improved metadata practices and better alignment with FAIR principles within the air quality monitoring subdomain.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"112071"},"PeriodicalIF":1.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12495067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An RNA sequencing dataset from a porcine immortalized pre-adipocyte cell line 来自猪永生化前脂肪细胞系的RNA测序数据集

IF 1.4

Data in Brief Pub Date : 2025-09-18 DOI: 10.1016/j.dib.2025.112074

Susanna E. Riley, Thomas Thrower, Seungmee Lee, Cristina L. Esteves, F. Xavier Donadeu

{"title":"An RNA sequencing dataset from a porcine immortalized pre-adipocyte cell line","authors":"Susanna E. Riley, Thomas Thrower, Seungmee Lee, Cristina L. Esteves, F. Xavier Donadeu","doi":"10.1016/j.dib.2025.112074","DOIUrl":"10.1016/j.dib.2025.112074","url":null,"abstract":"<div><div>There is a significant need for livestock cell lines that can be robustly expanded in culture while maintaining their functional characteristics, both for use as <em>in vitro</em> models to understand animal physiology and for industrial applications such as in the emerging sector of cellular agriculture. Here we describe RNA sequencing datasets from a spontaneously immortalized pre-adipocyte line that was derived through serial passaging of porcine adipose-derived mesenchymal stromal cells (MSCs). This cell line, known as FaTTy, is unique in that it displays enhanced adipogenic capacity during long-term culture, characterised by a close to 100% differentiation efficiency and the ability to generate mature adipocytes. Bulk RNA sequencing was performed from FaTTy and parental MSCs. We present analysis of the raw files and bioinformatics analyses, including differential gene expression. These data provide valuable insight on the mechanisms of cell immortalization and the unique phenotype of the FaTTy cell line, with distinctive advantages as a prospective cell source for cultivated fat manufacture.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112074"},"PeriodicalIF":1.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145218688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Whole-genome sequencing data of Salmonella enterica subsp. enterica serovar Enteritidis strain SSTRA25 isolated from a pediatric bacteremia case in Mosul, Iraq. 肠炎沙门氏菌亚种全基因组测序数据。从伊拉克摩苏尔儿童菌血症病例中分离出的肠炎血清型菌株SSTRA25。

IF 1.4

Data in Brief Pub Date : 2025-09-17 eCollection Date: 2025-10-01 DOI: 10.1016/j.dib.2025.112077

Shymaa F Yonis, Sura I Khudher, Talal S Salih, Rayan M Faisal, Ayman M Khaleel

{"title":"Whole-genome sequencing data of <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Enteritidis strain SSTRA25 isolated from a pediatric bacteremia case in Mosul, Iraq.","authors":"Shymaa F Yonis, Sura I Khudher, Talal S Salih, Rayan M Faisal, Ayman M Khaleel","doi":"10.1016/j.dib.2025.112077","DOIUrl":"10.1016/j.dib.2025.112077","url":null,"abstract":"<p><p><i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Enteritidis is a well-known non-typhoidal serovar, commonly associated with foodborne illnesses. Here, we report the draft genome sequence of <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Enteritidis strain SSTRA25, isolated from a pediatric patient with bacteremia in Mosul, Iraq. The genome was sequenced using the Illumina NovaSeq 6000 platform. The assembled and annotated genome comprised 4733,231 bp with 40 contigs, and a GC content of 52.12%. It contains 4580 coding sequences (CDSs), 69 tRNAs, 9 rRNAs, 14 ncRNAs, 2 CRISPR arrays, and 368 annotated subsystems. The analysis of antimicrobial resistance genes revealed multiple genes associated with various drug classes, including phenicols, penicillin beta-lactams, cephalosporins, carbapenems, and monobactams, with perfect sequence matches. In addition, chromosomal point mutations linked to antimicrobial resistance were identified with significant sequence similarity. <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Enteritidis SSTRA25 showed a high predicted human pathogenicity score (0.941) and carried multiple virulence factors, including SspH2, SopA, SadA, ShdA, MisL, and several flagellar and outer membrane proteins. In the pathogenic landscape, the closest strain was <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Holcomb NY_FSL C7-1028, with a Minkowski distance of 0.024804. The genome sequence of <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Enteritidis SSTRA25 has been deposited in NCBI under the accession number JBNHMR000000000.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"112077"},"PeriodicalIF":1.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12495041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computational dataset of cetyl trimethyl ammonium-anion complex structures and simulated infrared spectra. 十六烷基三甲基铵-阴离子配合物结构计算数据集及模拟红外光谱。

IF 1.4

Data in Brief Pub Date : 2025-09-17 eCollection Date: 2025-10-01 DOI: 10.1016/j.dib.2025.112072

Arun K Sharma, Owen McMillan

{"title":"Computational dataset of cetyl trimethyl ammonium-anion complex structures and simulated infrared spectra.","authors":"Arun K Sharma, Owen McMillan","doi":"10.1016/j.dib.2025.112072","DOIUrl":"10.1016/j.dib.2025.112072","url":null,"abstract":"<p><p>This dataset presents computationally derived geometries and vibrational spectra of cetyl trimethyl ammonium ion pairs formed with three environmentally and chemically relevant anions: bisulfate, nitrate, and methyl sulfonate. Data were obtained using quantum chemical calculations performed using the Gaussian 16 software package, utilizing the ONIOM method at the second-order Møller-Plesset perturbation theory (MP2) level. To account for solvent effects and electrostatic screening, calculations incorporated an implicit solvent environment via the polarizable continuum model (PCM). The dataset includes optimized geometric coordinates, harmonic vibrational frequencies, and simulated infrared (IR) spectra for each cetyl trimethyl ammonium-anion pair. Additional computational details, including input parameters, computational settings, and scripts used for data processing, are provided as supplementary material to ensure complete transparency and reproducibility. Data collection involved initial construction and optimization of the molecular geometries of ion pairs, followed by calculation of vibrational modes at equilibrium. Distance-displacement scans were conducted and the ONIOM total electronic energies and geometries of these non-stationary structures are reported. This comprehensive dataset offers significant reuse potential for researchers investigating surfactant chemistry, computational spectroscopy, environmental chemistry, and related disciplines.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"112072"},"PeriodicalIF":1.4,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12495042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phenotypic data on seedling traits of hexaploid spring wheat panel evaluated under heat stress. 热胁迫下六倍体春小麦苗期性状的表型分析。

IF 1.4

Data in Brief Pub Date : 2025-09-16 eCollection Date: 2025-10-01 DOI: 10.1016/j.dib.2025.112069

Santosh Gudi, Jatinder Singh, Harsimardeep Gill, Sunish Sehgal, Justin D Faris, Upinder Gill, Rajeev Gupta

{"title":"Phenotypic data on seedling traits of hexaploid spring wheat panel evaluated under heat stress.","authors":"Santosh Gudi, Jatinder Singh, Harsimardeep Gill, Sunish Sehgal, Justin D Faris, Upinder Gill, Rajeev Gupta","doi":"10.1016/j.dib.2025.112069","DOIUrl":"10.1016/j.dib.2025.112069","url":null,"abstract":"<p><p>Heat stress is the major abiotic stress affecting wheat at various developmental stages including seedling and reproductive stage. Heat stress at early developmental stages affects the seed germination and seedling establishment, thereby reduces grain yield per unit area. To overcome the negative impact of heat stress, it is crucial to identify the source of heat tolerant germplasm lines and also introduce them into breeding program. In this study, we evaluated 216 global diversity panel of hexaploid spring accessions comprising landraces and cultivars under non-heat stress (23 °C) and heat stress (36 °C) treatments. Phenotypic data was collected after 13 days of heat stress on various seedling traits, including coleoptile length (CL; cm), shoot length (SL; cm), root length (RL; cm), tiller number (TN), shoot fresh weight (SFW; mg), and root fresh weight (RFW; mg). Heat stress negatively affected all the seedling traits with maximum effect on RL (85.6 % reduction) and minimum effect on CL (15.44 %). However, the RN was increased by 20 % under heat stress. It was also noticed that the effect of heat stress was more on root traits (such as RL and RFW) as compared to shoot traits (such as SL and SFW). This suggests that compared to roots, shoots may have adaptive mechanisms such as transpiration cooling via stomatal regulation, to alleviate the negative impacts of heat stress. Moreover, the raw phenotypic data was subjected to mixed linear analysis to derive best linear unbiased estimates (BLUEs). BLUE values were further used to assess the intrinsic relationship among the seedling traits under non-heat stress (23 °C) and heat stress (36 °C) treatments. The dataset presented in this study serves a valuable source for identifying extremely tolerant lines for heat stress, which can be utilized in breeding program to develop heat resilient, high-yielding wheat cultivars. Moreover, this dataset helps in identifying potential genomic regions associated with improved heat stress tolerance, which can be incorporate in marker-assisted breeding of heat tolerant wheat varieties.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"112069"},"PeriodicalIF":1.4,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12493248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145231757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Student migration trends in Kazakhstan: A dataset by settlement and region (2020-2024) 哈萨克斯坦学生移民趋势：基于居住地和地区的数据集（2020-2024年）

IF 1.4

Data in Brief Pub Date : 2025-09-16 DOI: 10.1016/j.dib.2025.112059

Anel Tarakbay, Alexander Pilipenko, Assel Duisengali

{"title":"Student migration trends in Kazakhstan: A dataset by settlement and region (2020-2024)","authors":"Anel Tarakbay, Alexander Pilipenko, Assel Duisengali","doi":"10.1016/j.dib.2025.112059","DOIUrl":"10.1016/j.dib.2025.112059","url":null,"abstract":"<div><div>This data article presents a dataset on Kazakhstan's student migration within the country using the National Education Database (NEDB) for the years 2020-2024. The dataset includes transfers among schools for the country as a whole and year-by-year observations for the settlement type (urban or rural) and the region. To date, no publicly accessible datasets have been released at a comparable level of detail for educational mobility in Kazakhstan.</div><div>The data were derived from the NEDB, a centralized administrative system where educational organizations report student-level data in real time, which ensures the relevance of the data. The entered data are confirmed by electronic digital signatures of the heads of organizations, which ensuring their relevance. It contains detailed information about each student of the educational organization and their longitudinal education information. While raw data in NEDB presented are just administrative records, the student migration data were derived from these records. Using the mentioned longitudinal data, general migration trends by year (2020-2024), overall migration between regions, migration in high mobility areas, migration between rural and urban areas, migration trends by grade level of secondary school students were calculated. General migration trend data by year and overall migration data between regions can further be aggregated into bigger administrative units if needed.</div><div>This dataset was compiled as part of a broader research program on the education system of Kazakhstan that aims to forecast student enrollment, teacher requirements, and school infrastructure requirements. The gathered data also support school capacity planning, the evaluation of infrastructure burden, and determination of future demand for teachers and related educational resources.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112059"},"PeriodicalIF":1.4,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145128355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Grain by grain: A microscopic image dataset of rice varieties from Bangladeshi rice markets 一粒一粒：来自孟加拉国大米市场的水稻品种显微图像数据集

IF 1.4

Data in Brief Pub Date : 2025-09-16 DOI: 10.1016/j.dib.2025.112058

Md Tahsin , Kazi Isat Mahazabin , Maksura Binte Rabbani Nuha , Akil Rahman Efad , Mariya Rahman Momo , Nishat Tasnim Niloy , M. Saddam Hossain Khan , Rashedul Amin Tuhin , Mohammad Rifat Ahmmad Rashid , Raihan Ul Islam

{"title":"Grain by grain: A microscopic image dataset of rice varieties from Bangladeshi rice markets","authors":"Md Tahsin , Kazi Isat Mahazabin , Maksura Binte Rabbani Nuha , Akil Rahman Efad , Mariya Rahman Momo , Nishat Tasnim Niloy , M. Saddam Hossain Khan , Rashedul Amin Tuhin , Mohammad Rifat Ahmmad Rashid , Raihan Ul Islam","doi":"10.1016/j.dib.2025.112058","DOIUrl":"10.1016/j.dib.2025.112058","url":null,"abstract":"<div><div>Although Rice is the staple food of Bangladesh, the practice of cutting and polishing rice to make it look attractive, i.e., thinner, and shinier, leads to serious health concerns. While the motivation behind this process is to increase market value, it leads to the loss of minerals, nutrients, vitamins, and fibers, leaving only carbohydrates, which can result in serious health concerns. This article presents an extensive dataset of images of rice collected from the local market in Dhaka, Bangladesh, captured using high-resolution microscopic cameras. This dataset features more than 200 images each from 10 different types of rice, totalling approximately 2010 images. After augmentation, the folders expanded to 800 images, totalling 8010 augmented images. Each of the images provides a detailed view of the structure of each rice grain after the milling process and shows the result of the polishing process, which proves the nutritional loss. The sole purpose of presenting the dataset is to prove the hidden impact of the milling process on rice and serve as a valuable resource for further study on rice quality and overall food security.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"63 ","pages":"Article 112058"},"PeriodicalIF":1.4,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0