{"title":"Dynamic key vascular anatomy dataset for D2 lymph node dissection during laparoscopic gastric cancer surgery.","authors":"Longfei Gou, Haolin Wu, Chang Chen, Jiayu Lai, Hua Yang, Yuqing Qiu, Boer Su, Hongyu Wang, Bingyu Zhao, Xin Ye, Jinming Li, Xiaobing Bao, Guoxin Li, Jiang Yu, Yanfeng Hu, Qi Dou, Hao Chen","doi":"10.1038/s41597-025-05255-7","DOIUrl":"https://doi.org/10.1038/s41597-025-05255-7","url":null,"abstract":"<p><p>Gastric cancer (GC) is the fifth most common malignant tumor worldwide. Surgical resection remains the primary treatment for GC, with laparoscopic surgery recommended by several international guidelines. Due to complex perigastric vessels, standard D2 lymph node dissection (LND) in laparoscopic GC (LapGC) surgery is challenging. Careful dissection is required to expose, dissect, and ligate vessels without injury, ensuring radical LND. Computer vision has the potential to assist in the identification of key vessels during LapGC surgery, thereby reducing the risk of vascular injury. However, existing publicly available surgical anatomy datasets mainly focus on organ segmentation and simple surgeries. To address the clinical challenges and research needs outlined above, we present the LapGC Key Vascular Anatomy Dataset (LapGC-KVAD-30). This dataset was extracted from thirty complete surgical videos and contains annotations for fifteen types of key vessels across eight D2 LND scenes. The LapGC-KVAD-30 uniquely contains 5303 frames that showcase the dynamic process of key vessels from initial appearance to full exposure (or ligation), providing essential information for effective and safe LND.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"903"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144183125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-29DOI: 10.1038/s41597-025-05213-3
Cristiano A Köhler, Sonja Grün, Michael Denker
{"title":"Improving data sharing and knowledge transfer via the Neuroelectrophysiology Analysis Ontology (NEAO).","authors":"Cristiano A Köhler, Sonja Grün, Michael Denker","doi":"10.1038/s41597-025-05213-3","DOIUrl":"https://doi.org/10.1038/s41597-025-05213-3","url":null,"abstract":"<p><p>Describing the analysis of data from electrophysiology experiments investigating the function of neural systems is challenging. On the one hand, data can be analyzed by distinct methods with similar purposes, such as different algorithms to estimate the spectral power content of a measured time series. On the other hand, different software codes can implement the same analysis algorithm, while adopting different names to identify functions and parameters. These ambiguities complicate reporting analysis results, e.g., in a manuscript or on a scientific platform. Here, we illustrate how an ontology to describe the analysis process can assist in improving clarity, rigour and comprehensibility by complementing, simplifying and classifying the details of the implementation. We implemented the Neuroelectrophysiology Analysis Ontology (NEAO) to define a vocabulary and to standardize the descriptions of processes for neuroelectrophysiology data analysis. Real-world examples demonstrate how NEAO can annotate provenance information describing an analysis. Based on such provenance, we detail how it supports querying information (e.g., using knowledge graphs) that enable researchers to find, understand and reuse analysis results.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"907"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144180416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chromosome-level genome assembly of Paracoccus marginatus based on PacBio and Hi-C technologies.","authors":"Jiufeng Wei, Jinying Xue, Xuejie Shen, Gaoxiang Zhang, Qing Zhao, Yunyun Lu, Minmin Niu, Wei Ji","doi":"10.1038/s41597-025-04944-7","DOIUrl":"https://doi.org/10.1038/s41597-025-04944-7","url":null,"abstract":"<p><p>Invasive species pose a serious threat to ecosystems and biodiversity, leading to considerable economic losses for countries. The papaya mealybug (Paracoccus marginatus), is a prominent invasive pest that affects over 200 plant species and has been recorded in more than 60 countries and regions.Here, the chromosome-level genome of P. marginatus was assembled using PacBio and Hi-C technologies. The resulting genome, with a total size of 213.81 Mb, was organized into four chromosomes. The contig and scaffold N50 values were 20.2 Mb and 48.01 Mb, respectively. The genome assembly attained a BUSCO completeness score of 95.5%, and CEGMA analysis showed that 99.56% of the genome was thoroughly annotated. It includes 13,367 predicted protein-coding genes, with 49.26% of the assembly identified as repetitive sequences. This high-quality genome serves as a valuable resource for a range of research fields, such as population genetics, evolutionary studies, invasive species management, and comparative genomics within Hemiptera and other insect groups.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"901"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144181683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-29DOI: 10.1038/s41597-025-05239-7
Hyunji Ha, Ken G Sweat, Kendra D Conrow, Richard S Haney, Thomas M Cahill, David S LeBauer, Maxwell C K Leung
{"title":"Remediating toxic elements with sunflower, hemp, castor bean, & bamboo: an open dataset of harmonized variables.","authors":"Hyunji Ha, Ken G Sweat, Kendra D Conrow, Richard S Haney, Thomas M Cahill, David S LeBauer, Maxwell C K Leung","doi":"10.1038/s41597-025-05239-7","DOIUrl":"https://doi.org/10.1038/s41597-025-05239-7","url":null,"abstract":"<p><p>This dataset was compiled between August 1, 2022, and March 15, 2023, through a comprehensive literature review of 587 studies on the uptake of elements from the soil by plants (i.e., phytoremediation). As a proof of concept, we compiled research results on four commodity crops suitable for phytoremediation in semi-arid environments, namely sunflower, hemp, castor bean, and bamboo. Two hundred thirty-eight studies had data on soil types, elemental pollution, and plant components for calculating bioconcentration factors. Using a harmonized set of variables, we extracted data from these studies to create a database to organize results for interpretation and enable consistent and further literature analysis. This approach can help industry experts and environmental researchers select crops for their intended extraction applications, as well as provide insights into the bioaccumulation of toxic elements in plants.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"905"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-29DOI: 10.1038/s41597-025-05230-2
Renata C Asprino, Liming Cai, Yujing Yan, Peter J Flynn, Lucas C Marinho, Xiaoshan Duan, Christiane Anderson, Goia M Lyra, Charles C Davis, Bruno A S de Medeiros
{"title":"A curated benchmark dataset for molecular identification based on genome skimming.","authors":"Renata C Asprino, Liming Cai, Yujing Yan, Peter J Flynn, Lucas C Marinho, Xiaoshan Duan, Christiane Anderson, Goia M Lyra, Charles C Davis, Bruno A S de Medeiros","doi":"10.1038/s41597-025-05230-2","DOIUrl":"https://doi.org/10.1038/s41597-025-05230-2","url":null,"abstract":"<p><p>Genome skimming is a promising sequencing strategy for DNA-based taxonomic identification. However, the lack of standardized datasets for benchmarking genome skimming tools presents a challenge in comparing new methods to existing ones. As part of the development of varKoder, a new tool for DNA-based identification, we curated four datasets designed for comparing molecular identification tools using low-coverage genomes. These datasets comprise vast phylogenetic and taxonomic diversity from closely related species to all taxa currently represented on NCBI SRA. One of them consists of novel sequences from taxonomically verified samples in the plant clade Malpighiales, while the other three datasets compile publicly available data. All include raw genome skim sequences to enable comprehensive testing and validation of a variety molecular species identification methods. We also provide the two-dimensional graphical representations of genomic data used in varKoder. These datasets represent a reliable resource for researchers to assess the accuracy, efficiency, and robustness of new tools to varKoder and other methods in a consistent and reproducible manner.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"906"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-29DOI: 10.1038/s41597-025-05178-3
Weitao Chen, Chao Li, Rong Yang, Yuefei Li, Baosheng Wu, Jie Li
{"title":"Haplotype resolved chromosome-level genome assembly of the gold barb (Barbodes semifasciolatus).","authors":"Weitao Chen, Chao Li, Rong Yang, Yuefei Li, Baosheng Wu, Jie Li","doi":"10.1038/s41597-025-05178-3","DOIUrl":"https://doi.org/10.1038/s41597-025-05178-3","url":null,"abstract":"<p><p>The gold barb (Barbodes semifasciolatus), a member of the Cyprinidae family, exhibits remarkable adaptability to highly acidic environments, making it an ideal model for studying extreme environmental adaptation. However, its genome has not been previously characterized. To address this, we assembled a high-quality chromosome-scale genome for B. semifasciolatus using High-Fidelity (HiFi) sequencing and Hi-C technology. The resulting haplotype-resolved assemblies, spanning 776 Mb and 779 Mb across 25 chromosomes, achieved genome coverages of 99.5% and 99.7%, respectively, and included four gap-free chromosomes. Genome quality assessment using BUSCO indicated a high completeness score of 98.2% for haplotype1 and 98.3% for haplotype2, further validated by strong synteny with the zebrafish (Danio rerio), confirming the assembly's integrity and continuity. Through integration of full-length transcriptome data, RNA sequencing, and homology-based annotation, we identified 26,057 protein-coding genes with 2,087 pseudogenes in haplotype 2, and 25,622 protein-coding genes with 2,101 pseudogenes in haplotype 1. This high-resolution genome assembly is a crucial resource for advancing research in the Cyprinidae, particularly for understanding adaptive evolution in extreme environments.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"902"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144183700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-29DOI: 10.1038/s41597-025-05185-4
Hossein Kabir, Jordan Wu, Sunav Dahal, Tony Joo, Nishant Garg
{"title":"SorpVision: A Comprehensive Dataset for Cementitious Sorptivity Analysis Powered by Computer Vision.","authors":"Hossein Kabir, Jordan Wu, Sunav Dahal, Tony Joo, Nishant Garg","doi":"10.1038/s41597-025-05185-4","DOIUrl":"https://doi.org/10.1038/s41597-025-05185-4","url":null,"abstract":"<p><p>As the construction industry advances toward more efficient methods for assessing durability, the need for automated sorptivity evaluation has become increasingly critical. Consequently, this study introduces SorpVision, a dataset of 7,384 images (5,000 real and 2,384 synthetic) designed to support our custom computer vision-based framework for automated sorptivity evaluation in cementitious materials. Traditional methods, such as ASTM C1585, depend on manual weighing, which is time-consuming and limits measurement intervals. SorpVision, combined with a cost-effective USB camera setup and a robust vision algorithm, facilitates real-time water level detection in cementitious systems. The framework, trained using 1,440 data points from pastes with water-to-cement (w/c) ratios of 0.4-0.8 and curing durations of 1-7 days, achieves high predictive accuracy for initial and secondary sorptivities (R<sup>2</sup> > 0.9 for cement pastes). Moreover, it generalizes well to mortar and concrete, yielding R<sup>2</sup> values of 0.96 and 0.87 for initial sorptivity and 0.74 and 0.65 for secondary sorptivity, respectively. SorpVision offers an accurate, data-driven foundation for scalable, automated durability evaluations, supporting sustainable infrastructure development.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"904"},"PeriodicalIF":5.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144181822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A benchmark dataset for class-wise segmentation of construction and demolition waste in cluttered environments.","authors":"Diani Sirimewan, Sanuwani Dayarathna, Sudharshan Raman, Yu Bai, Mehrdad Arashpour","doi":"10.1038/s41597-025-05243-x","DOIUrl":"10.1038/s41597-025-05243-x","url":null,"abstract":"<p><p>Efficient management of construction and demolition waste (CDW) is essential for enhancing resource recovery. The lack of publicly available, high-quality datasets for waste recognition limits the development and adoption of automated waste handling solutions. To facilitate data sharing and reuse, this study introduces 'CDW-Seg', a benchmark dataset for class-wise segmentation of CDW. The dataset comprises high-resolution images captured at authentic construction sites, featuring skip bins filled with a diverse mixture of CDW materials in-the-wild. It includes 5,413 manually annotated objects across ten categories: concrete, fill dirt, timber, hard plastic, soft plastic, steel, fabric, cardboard, plasterboard, and the skip bin, representing a total of 2,492,021,189 pixels. Each object was meticulously annotated through semantic segmentation, providing reliable ground-truth labels. To demonstrate the applicability of the dataset, an adapter-based fine-tuning approach was implemented using a hierarchical Vision Transformer, ensuring computational efficiency suitable for deployment in automated waste handling scenarios. The CDW-Seg has been made publicly accessible to promote data sharing, facilitate further research, and support the development of automated solutions for resource recovery.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"885"},"PeriodicalIF":5.8,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12120074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144174719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-05-28DOI: 10.1038/s41597-025-05097-3
Cameron Bracken, Youngjun Son, Daniel Broman, Nathalie Voisin
{"title":"GODEEEP-hydro: Historical and projected power system ready hydropower data for the United States.","authors":"Cameron Bracken, Youngjun Son, Daniel Broman, Nathalie Voisin","doi":"10.1038/s41597-025-05097-3","DOIUrl":"10.1038/s41597-025-05097-3","url":null,"abstract":"<p><p>Hydropower is a critical electricity resource in the United States which, in addition to low-cost electricity generation, provides valuable ancillary grid services, and supports the integration of nondispatchable weather-dependent resources (e.g., wind and solar). Despite its value to the grid, there are very few comprehensive datasets available from which to study both historical and future impacts of climate, weather driven energy droughts, and integration of other weather driven generation. In this paper, we present a hydropower generation dataset covering 1,452 hydroelectric plants in the contiguous U.S. The dataset contains monthly and weekly hydropower generation estimates for both historical (1982-2019) and future (2020-2099) periods which includes 4 future climate scenarios. In addition, this dataset provides weekly and monthly constraints such as minimum and maximum power which are particularly useful in power system models which are used to study grid reliability, transmission planning and capacity expansion.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"875"},"PeriodicalIF":5.8,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12120057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144174748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}