Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04456-4
Neema Mduma, Christian Elinisa
{"title":"Banana Leaves Imagery Dataset.","authors":"Neema Mduma, Christian Elinisa","doi":"10.1038/s41597-025-04456-4","DOIUrl":"10.1038/s41597-025-04456-4","url":null,"abstract":"<p><p>In this work, we present a dataset of banana leaf imagery, both with and without diseases. The dataset consists of 11,767 images, categorized as follows: 3,339 healthy images, 3,496 images of leaves affected by Black Sigatoka and 4,932 images of leaves affected by Fusarium Wilt Race 1. This data was collected to support machine learning diagnostics for disease detection. The data collection process involved farmers, researchers, agricultural experts and plant pathologists from the northern and southern highland regions of Tanzania. To ensure unbiased representation, farms were randomly selected from the Rungwe, Mbeya, Arumeru, and Arusha districts, based on the presence of banana crops and the targeted diseases. The dataset offers a comprehensive collection of images captured from November 2022 to January 2023, using a high-resolution smartphone camera across a wide geographical area. Researchers and developers can use this dataset to build machine learning solutions that automatically detect diseases in images, potentially enabling agricultural stakeholders, including farmers, to diagnose Fusarium Wilt Race 1 and Black Sigatoka early and take timely action.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"477"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928457/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04798-z
Ruochen Ren, Zhipeng Wang, Chaoyun Yang, Jiahang Liu, Rong Jiang, Yanmin Zhou, Shuo Jiang, Bin He
{"title":"Enhancing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks.","authors":"Ruochen Ren, Zhipeng Wang, Chaoyun Yang, Jiahang Liu, Rong Jiang, Yanmin Zhou, Shuo Jiang, Bin He","doi":"10.1038/s41597-025-04798-z","DOIUrl":"10.1038/s41597-025-04798-z","url":null,"abstract":"<p><p>The advent of large language models has transformed human-robot interaction by enabling robots to execute tasks via natural language commands. However, these models primarily depend on unimodal data, which limits their ability to integrate diverse and essential environmental, physiological, and physical data. To address the limitations of current unimodal dataset problems, this paper investigates the novel and comprehensive multimodal data collection methodologies which can fully capture the complexity of human interaction in the complex real-world kitchen environments. Data related to the use of 17 different kitchen tools by 20 adults in dynamic scenarios were collected, including human tactile information, EMG signals, audio data, whole-body movement, and eye-tracking data. The dataset is comprised of 680 segments (~11 hours) with data across seven modalities and includes 56,000 detailed annotations. This paper bridges the gap between real-world multimodal data and embodied AI, paving the way for a new benchmark in utility and repeatability for skill learning in robotics areas.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"476"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04778-3
Qian Zhao, Zixu Fan, Hui Yu, Zhanli Wang
{"title":"The high-quality chromosome-level genome assembly of Dracocephalum rupestre Hance.","authors":"Qian Zhao, Zixu Fan, Hui Yu, Zhanli Wang","doi":"10.1038/s41597-025-04778-3","DOIUrl":"10.1038/s41597-025-04778-3","url":null,"abstract":"<p><p>Dracocephalum rupestre Hance is China's traditional herbal medicine in the family Labiatae with numerous health benefits, including anti-inflammatory, antiviral and anti-tumor activities. However, the genus Dracocephalum has no reference genome currently, which restricts the research on the breeding, cultivation and exploration of medicinal properties in D. rupestre. Thus, we present the high-quality chromosome-level genome assembly of D. rupestre using a combination of Pacbio HiFi sequencing and Hi-C scaffolding technologies. The final genome was 435.45 Mb with a contig N50 of 49.83 Mb and a scaffold N50 of 59.06 Mb. The assembled sequences were anchored to 7 chromosomes with an integration efficiency of 96.96%. Furthermore, we predicted 25,865 protein-coding genes, 98.23% of which were functionally annotated. These results offer valuable resources for understanding the genetic basis of the unique phenotypes of D. rupestre and will facilitate further study of the functional genomics of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"472"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928585/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04812-4
Jiangping Shu, Yongxia Zhang, Tengbo Huang, Yuehong Yan
{"title":"The chromosome-level genome assembly of Broad-Leaf Fern (Dipteris shenzhenensis).","authors":"Jiangping Shu, Yongxia Zhang, Tengbo Huang, Yuehong Yan","doi":"10.1038/s41597-025-04812-4","DOIUrl":"10.1038/s41597-025-04812-4","url":null,"abstract":"<p><p>Dipteris is a relic plant genus and an important indicator of global climate warming and plant geography during the Mesozoic era. However, the lack of genomic resources has hindered the study of paleoclimate, systematic evolution, and medicinal value of this genus. Here, we sequenced and assembled the first chromosome-level genome of Dipteris shenzhenensis. The assembled genome was 1.9 Gb with a contig N50 length of 4.75 Mb, GC content of 42.28% and BUSCO value of 98.3%, and 98.37% of the assembled sequences were anchored onto 33 pseudochromosomes. 71.97% of the genome were predicted to be repetitive sequences, and 45 telomeres were identified, including 15 paired telomeres. A total of 26,471 protein coding genes were predicted, of which 24,485 (92.5%) genes were functionally annotated. The first high-quality genome of Dipteris will provide important genome resources for understanding the systematic evolution, paleoclimate and medicinal value of ferns.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"475"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04740-3
Neda Sadeghi, Isabelle F van der Velpen, Bradley T Baker, Ishaan Batta, Kyle J Cahill, Sarah Genon, Ethan McCormick, Léa C Michel, Dustin Moraczewski, Masoud Seraji, Philip Shaw, Rogers F Silva, Najme Soleimani, Emma Sprooten, Øystein Sørensen, Adam G Thomas, Audrey Thurm, Zi-Xuan Zhou, Vince D Calhoun, Rogier Kievit, Anna Plachti, Xi-Nian Zuo, Tonya White
{"title":"The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets.","authors":"Neda Sadeghi, Isabelle F van der Velpen, Bradley T Baker, Ishaan Batta, Kyle J Cahill, Sarah Genon, Ethan McCormick, Léa C Michel, Dustin Moraczewski, Masoud Seraji, Philip Shaw, Rogers F Silva, Najme Soleimani, Emma Sprooten, Øystein Sørensen, Adam G Thomas, Audrey Thurm, Zi-Xuan Zhou, Vince D Calhoun, Rogier Kievit, Anna Plachti, Xi-Nian Zuo, Tonya White","doi":"10.1038/s41597-025-04740-3","DOIUrl":"10.1038/s41597-025-04740-3","url":null,"abstract":"<p><p>One of the challenges in the field of neuroimaging is that we often lack knowledge about the underlying truth and whether our methods can detect developmental changes. To address this gap, five research groups around the globe created simulated datasets embedded with their assumptions of the interplay between brain development, cognition, and behavior. Each group independently created the datasets, unaware of the approaches and assumptions made by the other groups. Each group simulated three datasets with the same variables, each with 10,000 participants over 7 longitudinal waves, ranging from 7 to 20 years-of-age. The independently created datasets include demographic data, brain derived variables along with behavior and cognition variables. These datasets and code that were used to generate the datasets can be downloaded and used by the research community to apply different longitudinal models to determine the underlying patterns and assumptions where the ground truth is known.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"473"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04773-8
Jumei Zhang, Qing Xu, Lei You, Bin Li, Zezhi Zhang, Wenyao Lin, Xiangyin Luo, Zhengxiu Ye, Lanlan Zheng, Chen Li, Junpeng Niu, Guodong Wang, Honghong Hu, Chao Zhou, Yonghong Zhang
{"title":"Chromosome-scale genome assembly and annotation of Huzhang (Reynoutria japonica).","authors":"Jumei Zhang, Qing Xu, Lei You, Bin Li, Zezhi Zhang, Wenyao Lin, Xiangyin Luo, Zhengxiu Ye, Lanlan Zheng, Chen Li, Junpeng Niu, Guodong Wang, Honghong Hu, Chao Zhou, Yonghong Zhang","doi":"10.1038/s41597-025-04773-8","DOIUrl":"10.1038/s41597-025-04773-8","url":null,"abstract":"<p><p>Reynoutria japonica, commonly known as Huzhang or Japanese knotweed, is a perennial herbaceous plant belonging to the family Polygonaceae and order Caryophyllales. This plant is valued for its traditional medicinal uses in China. In this study, we present a high-quality, chromosome-scale reference assembly for R. japonica using a combination of PacBio long-read sequencing, Hi-C reads, and Illumina short-read sequencing. The final assembled genome spans approximately 3.30 Gb, with a contig N50 of 1.39 Mb. Notably, 99.22% of the assembled sequences were anchored to 22 pseudo-chromosomes, and 74.79% of the genome is composed of repetitive elements. Genome annotation revealed 68,646 protein-coding genes and 14,788 non-coding RNAs. This genomic resource provides a robust foundation for comparative genomics and will enable deep insights into the evolutionary relationships across related species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"474"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-21DOI: 10.1038/s41597-025-04710-9
Nicholas S Marzolf, Weston M Slaughter, Michael J Vlah, Spencer A Rhea, Amanda G DelVecchia, Emily S Bernhardt
{"title":"Ecosystem metabolism estimates from the National Ecological Observatory Network (NEON) stream and river sites.","authors":"Nicholas S Marzolf, Weston M Slaughter, Michael J Vlah, Spencer A Rhea, Amanda G DelVecchia, Emily S Bernhardt","doi":"10.1038/s41597-025-04710-9","DOIUrl":"10.1038/s41597-025-04710-9","url":null,"abstract":"<p><p>Expanded availability of estimates of ecosystem metabolism and gas exchange from the worlds streams and rivers is rapidly revising estimates of river contributions to global carbon budgets. Here, we present estimates of gross primary production, ecosystem respiration, and gas exchange from 27 streams and rivers across North America, including Puerto Rico, using data from the National Ecological Observatory Network (NEON). Further, we explore how aggregating and processing input data influences model outputs, expanding the methodological knowledge in approaching sensor collection and manipulation for ecosystem-scale modelling. We apply filters to input data to determine how different approaches to quality control of raw data influence the quantity and precision of estimates of ecosystem metabolism. Model estimates are high priority measures of ecosystem function that integrate additional NEON data products that will allow further understanding of stream and river biogeochemistry and ecosystem function across time and space.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"478"},"PeriodicalIF":5.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928455/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143677164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-20DOI: 10.1038/s41597-025-04757-8
Yan Zhang, Mei-Po Kwan, Libo Fang
{"title":"An LLM driven dataset on the spatiotemporal distributions of street and neighborhood crime in China.","authors":"Yan Zhang, Mei-Po Kwan, Libo Fang","doi":"10.1038/s41597-025-04757-8","DOIUrl":"10.1038/s41597-025-04757-8","url":null,"abstract":"<p><p>Crime is a significant social, economic, and legal issue. This research presents an open-access spatiotemporal repository of street and neighborhood crime data, comprising approximately one million records of crimes in China, with specific geographic coordinates (latitude and longitude) and timestamps for each incident. The dataset is based on publicly available law court judgment documents. Artificial intelligence (AI) technologies are employed to extract crime events at the neighborhood or even building level from vast amounts of unstructured judicial text. This dataset enables more precise spatial analysis of crime incidents, offering valuable insights across interdisciplinary fields such as economics, sociology, and geography. It contributes significantly to the achievement of the United Nations Sustainable Development Goals (SDGs), particularly in fostering sustainable cities and communities, and plays a crucial role in advancing efforts to reduce all forms of violence and related mortality rates.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"467"},"PeriodicalIF":5.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11926219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143670929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-20DOI: 10.1038/s41597-025-04775-6
Jay Patrikar, Joao Dantas, Brady Moon, Milad Hamidi, Sourish Ghosh, Nikhil Keetha, Ian Higgins, Atharva Chandak, Takashi Yoneyama, Sebastian Scherer
{"title":"Image, speech, and ADS-B trajectory datasets for terminal airspace operations.","authors":"Jay Patrikar, Joao Dantas, Brady Moon, Milad Hamidi, Sourish Ghosh, Nikhil Keetha, Ian Higgins, Atharva Chandak, Takashi Yoneyama, Sebastian Scherer","doi":"10.1038/s41597-025-04775-6","DOIUrl":"10.1038/s41597-025-04775-6","url":null,"abstract":"<p><p>We introduce TartanAviation, an open-source multi-modal dataset focused on terminal-area airspace operations. TartanAviation provides a holistic view of the airport environment by concurrently collecting image, speech, and ADS-B trajectory data using setups installed inside airport boundaries. The datasets were collected at both towered and non-towered airfields across multiple months to capture diversity in aircraft operations, seasons, aircraft types, and weather conditions. In total, TartanAviation provides 3.1M images, 3374 hours of Air Traffic Control speech data, and 661 days of ADS-B trajectory data. In addition to the raw data, we provide post-processed versions with synchronized, filtered, and interpolated data. In addition to the dataset, we also open-source the code-base used to collect and pre-process the dataset, further enhancing accessibility and usability. We believe this dataset has many potential use cases and would be particularly vital in allowing AI and machine learning technologies to be integrated into air traffic control systems and advance the adoption of autonomous aircraft in the airspace.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"468"},"PeriodicalIF":5.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11926361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143670930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-03-20DOI: 10.1038/s41597-025-04665-x
Anna Simson, Anil Yildiz, Julia Kowalski
{"title":"Reusability-targeted enrichment of sea ice core data.","authors":"Anna Simson, Anil Yildiz, Julia Kowalski","doi":"10.1038/s41597-025-04665-x","DOIUrl":"10.1038/s41597-025-04665-x","url":null,"abstract":"<p><p>The Reusability-targeted Enriched Sea Ice Core Database (RESICE) combines data and metadata from 287 sea ice cores. The database enables reuse scenarios such as the validation of physics-based models and the training of data-driven algorithms. RESICE is enriched in two ways. First, RESICE combines data and metadata originating from 138 sources including 107 data sets from the repositories Zenodo, Australian Antarctic Data Center and Pangaea. Second, RESICE contains additional automatically generated metadata tailored to specific reuse scenarios. RESICE is checked for plausibility and consistency, and it allows transparent retracing of each data point to its source. RESICE is accessible via Zenodo and the MOSAiC webODV, and it is extendable through the pyresice Python package. In addition to describing RESICE, we formalize the reuse perspective of an agnostic reuser, uninvolved in data acquisition, and we discuss the process of the cross-source and -repository combination of the database. Despite sources adhering to FAIR, this process is challenging and time-intensive due to the heterogeneity of the sources and their mismatch with reuse requirements.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"465"},"PeriodicalIF":5.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11926198/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143670934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}