Scientific DataPub Date : 2024-11-06DOI: 10.1038/s41597-024-04049-7
Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan
{"title":"Identifying genomic data use with the Data Citation Explorer.","authors":"Neil Byers, Charles Parker, Chris Beecroft, T B K Reddy, Hugh Salamon, George Garrity, Kjiersten Fagnan","doi":"10.1038/s41597-024-04049-7","DOIUrl":"10.1038/s41597-024-04049-7","url":null,"abstract":"<p><p>Increases in sequencing capacity, combined with rapid accumulation of publications and associated data resources, have increased the complexity of maintaining associations between literature and genomic data. As the volume of literature and data have exceeded the capacity of manual curation, automated approaches to maintaining and confirming associations among these resources have become necessary. Here we present the Data Citation Explorer (DCE), which discovers literature incorporating genomic data that was not formally cited. This service provides advantages over manual curation methods including consistent resource coverage, metadata enrichment, documentation of new use cases, and identification of conflicting metadata. The service reduces labor costs associated with manual review, improves the quality of genome metadata maintained by the U.S. Department of Energy Joint Genome Institute (JGI), and increases the number of known publications that incorporate its data products. The DCE facilitates an understanding of JGI impact, improves credit attribution for data generators, and can encourage data sharing by allowing scientists to see how reuse amplifies the impact of their original studies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1200"},"PeriodicalIF":5.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-06DOI: 10.1038/s41597-024-04001-9
Yu Zou, Jingqiang Fu, Yuan Liang, Xuan Luo, Minghui Shen, Miaoqin Huang, Yexin Chen, Weiwei You, Caihuan Ke
{"title":"Chromosome-level genome assembly of the ivory shell Babylonia areolata.","authors":"Yu Zou, Jingqiang Fu, Yuan Liang, Xuan Luo, Minghui Shen, Miaoqin Huang, Yexin Chen, Weiwei You, Caihuan Ke","doi":"10.1038/s41597-024-04001-9","DOIUrl":"10.1038/s41597-024-04001-9","url":null,"abstract":"<p><p>The ivory shell Babylonia areolata is an economically important marine benthic gastropod known for its rapid growth and high nutritional value. B. areolata is distributed in Southeast Asia and the southeast coastal areas of China. In this study, we constructed a high-quality genome for B. areolata using PacBio, Illumina, and Hi-C sequencing technologies. The genome assembly comprised 35 chromosomal sequences with a total length of 1.65 Gb. The scaffold and contig N50 lengths were 53.17 Mb and 2.64 Mb, respectively, with repeat sequences constituting 64.46% of the genome. Furthermore, 26,130 protein-coding genes and 96.75% of the genome's BUSCOs were identified. This inaugural report of a B. areolata genome provides crucial foundational information for further investigations into the biology, genomics, and genetic improvement of economic traits of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1201"},"PeriodicalIF":5.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-05DOI: 10.1038/s41597-024-04050-0
Thomas E Blanford, David P Williams, J Daniel Park, Brian T Reinhardt, Kyle S Dalton, Shawn F Johnson, Daniel C Brown
{"title":"An in-air synthetic aperture sonar dataset of target scattering in environments of varying complexity.","authors":"Thomas E Blanford, David P Williams, J Daniel Park, Brian T Reinhardt, Kyle S Dalton, Shawn F Johnson, Daniel C Brown","doi":"10.1038/s41597-024-04050-0","DOIUrl":"10.1038/s41597-024-04050-0","url":null,"abstract":"<p><p>This paper describes a synthetic aperture sonar (SAS) dataset collected in-air consisting of four types of targets in four environments of different complexity. The in-air laboratory based experiments produced data with a level of fidelity and ground truth accuracy that is not easily attainable in data collected underwater. The range of complexity, high level of data fidelity, and accurate ground truth provides a rich dataset with acoustic features on multiple scales. It can be used to develop new signal-processing and image reconstruction algorithms, as well as machine learning models for object detection and classification. It may also find application in model verification and validation for acoustic simulators. The dataset consists of raw acoustic time series returns, associated environmental conditions, hardware configuration, array motion, as well as the reconstructed imagery.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1196"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-05DOI: 10.1038/s41597-024-04034-0
Komlavi Akpoti, Naga Manohar Velpuri, Naoki Mizukami, Stefanie Kagone, Mansoor Leh, Kirubel Mekonnen, Afua Owusu, Primrose Tinonetsana, Michael Phiri, Lahiru Madushanka, Tharindu Perera, Paranamana Thilina Prabhath, Gabriel E L Parrish, Gabriel B Senay, Abdulkarim Seid
{"title":"Advancing water security in Africa with new high-resolution discharge data.","authors":"Komlavi Akpoti, Naga Manohar Velpuri, Naoki Mizukami, Stefanie Kagone, Mansoor Leh, Kirubel Mekonnen, Afua Owusu, Primrose Tinonetsana, Michael Phiri, Lahiru Madushanka, Tharindu Perera, Paranamana Thilina Prabhath, Gabriel E L Parrish, Gabriel B Senay, Abdulkarim Seid","doi":"10.1038/s41597-024-04034-0","DOIUrl":"10.1038/s41597-024-04034-0","url":null,"abstract":"<p><p>VegDischarge v1, which covers over 64,000 river segments in Africa, is a natural river discharge dataset produced by coupled modeling; the agro-hydrologic VegET model and the mizuRoute routing model for the period 2001-2021. Using remote sensing data and hydrological modeling system, the 1-km runoff field simulated by VegET, was routed with mizuRoute. Performance metrics show strong model reliability, with R² of 0.5-0.9, NSE of 0.6-0.9, and KGE of 0.5-0.8 at the continental scale. The total average annual discharge for Africa is quantified at 3271.4 km³·year<sup>-1</sup>, with contributions to oceanic basins: 1000.0 km³·year<sup>-1</sup> to the North Atlantic, primarily from the Senegal, Gambia, Volta, and Niger Rivers; 1327.2 km³·year<sup>-1</sup> to the South Atlantic, largely from the Congo River; 214.7 km³·year<sup>-1</sup> to the Mediterranean Sea, predominantly from the Nile River; and 729.4 km³·year<sup>-1</sup> to the Indian Ocean, with inputs from rivers such as the Zambezi. The dataset is valuable for stakeholders and researchers to understand water availability, its temporal and spatial variations that affect water-related infrastructure planning, sustainable resource allocation, and the development of climate resilience strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1195"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A long-term high-resolution dataset of grasslands grazing intensity in China.","authors":"Daju Wang, Qiongyan Peng, Xiangqian Li, Wen Zhang, Xiaosheng Xia, Zhangcai Qin, Peiyang Ren, Shunlin Liang, Wenping Yuan","doi":"10.1038/s41597-024-04045-x","DOIUrl":"10.1038/s41597-024-04045-x","url":null,"abstract":"<p><p>Grazing is a significant anthropogenic disturbance to grasslands, impacting their function and composition, and affecting carbon budgets and greenhouse gas emissions. However, accurate evaluations of grazing impacts are limited by the absence of long-term high-resolution grazing intensity data (i.e., the number of livestock per unit area). This study utilized census livestock data and a satellite-based vegetation index to develop the first Long-term High-resolution Grazing Intensity (LHGI) dataset of grassland in seven pastoral provinces in western China from 1980 to 2022. The LHGI dataset effectively captured spatial variations in grazing intensity, with validation at 73 sites showing a correlation coefficient (R<sup>2</sup>) of 0.78. The county-level validation showed an averaged R<sup>2</sup> values of 0.73 ± 0.03 from 1980 to 2022. This dataset serves as a vital resource for estimating grassland carbon cycling and livestock system CH<sub>4</sub> emissions, as well as contributing to grassland management.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1194"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-05DOI: 10.1038/s41597-024-04055-9
Karel Hynek, Jan Luxemburk, Jaroslav Pešek, Tomáš Čejka, Pavel Šiška
{"title":"Author Correction: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines.","authors":"Karel Hynek, Jan Luxemburk, Jaroslav Pešek, Tomáš Čejka, Pavel Šiška","doi":"10.1038/s41597-024-04055-9","DOIUrl":"10.1038/s41597-024-04055-9","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1199"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538410/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-05DOI: 10.1038/s41597-024-04048-8
Ruth H Thurstan, Hannah McCormick, Joanne Preston, Elizabeth C Ashton, Floris P Bennema, Ana Bratoš Cetinić, Janet H Brown, Tom C Cameron, Fiz da Costa, David W Donnan, Christine Ewers, Tomaso Fortibuoni, Eve Galimany, Otello Giovanardi, Romain Grancher, Daniele Grech, Maria Hayden-Hughes, Luke Helmer, K Thomas Jensen, José A Juanes, Janie Latchford, Alec B M Moore, Dimitrios K Moutopoulos, Pernille Nielsen, Henning von Nordheim, Bárbara Ondiviela, Corina Peter, Bernadette Pogoda, Bo Poulsen, Stéphane Pouvreau, Cordula Scherer, Aad C Smaal, David Smyth, Åsa Strand, John A Theodorou, Philine S E Zu Ermgassen
{"title":"Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems.","authors":"Ruth H Thurstan, Hannah McCormick, Joanne Preston, Elizabeth C Ashton, Floris P Bennema, Ana Bratoš Cetinić, Janet H Brown, Tom C Cameron, Fiz da Costa, David W Donnan, Christine Ewers, Tomaso Fortibuoni, Eve Galimany, Otello Giovanardi, Romain Grancher, Daniele Grech, Maria Hayden-Hughes, Luke Helmer, K Thomas Jensen, José A Juanes, Janie Latchford, Alec B M Moore, Dimitrios K Moutopoulos, Pernille Nielsen, Henning von Nordheim, Bárbara Ondiviela, Corina Peter, Bernadette Pogoda, Bo Poulsen, Stéphane Pouvreau, Cordula Scherer, Aad C Smaal, David Smyth, Åsa Strand, John A Theodorou, Philine S E Zu Ermgassen","doi":"10.1038/s41597-024-04048-8","DOIUrl":"10.1038/s41597-024-04048-8","url":null,"abstract":"<p><p>Ocean ecosystems have been subjected to anthropogenic influences for centuries, but the scale of past ecosystem changes is often unknown. For centuries, the European flat oyster (Ostrea edulis), an ecosystem engineer providing biogenic reef habitats, was a culturally and economically significant source of food and trade. These reef habitats are now functionally extinct, and almost no memory of where or at what scales this ecosystem once existed, or its past form, remains. The described datasets present qualitative and quantitative extracts from written records published between 1524 and 2022. These show: (1) locations of past flat oyster fisheries and/or oyster reef habitat described across its biogeographical range, with associated levels of confidence; (2) reported extent of past oyster reef habitats, and; (3) species associated with these habitats. These datasets will be of use to inform accelerating flat oyster restoration activities, to establish reference models for anchoring adaptive management of restoration action, and in contributing to global efforts to recover records on the hidden history of anthropogenic-driven ocean ecosystem degradation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1198"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-04DOI: 10.1038/s41597-024-04033-1
Julan Kim, Yoonsik Kim, Jeongwoen Shin, Yeong-Kuk Kim, Doo Ho Lee, Jong-Won Park, Dain Lee, Hyun-Chul Kim, Jeong-Ho Lee, Seung Hwan Lee, Jun Kim
{"title":"Fully phased genome assemblies and graph-based genetic variants of the olive flounder, Paralichthys olivaceus.","authors":"Julan Kim, Yoonsik Kim, Jeongwoen Shin, Yeong-Kuk Kim, Doo Ho Lee, Jong-Won Park, Dain Lee, Hyun-Chul Kim, Jeong-Ho Lee, Seung Hwan Lee, Jun Kim","doi":"10.1038/s41597-024-04033-1","DOIUrl":"10.1038/s41597-024-04033-1","url":null,"abstract":"<p><p>The olive flounder, Paralichthys olivaceus, also known as the Korean halibut, is an economically important flatfish in East Asian countries. Here, we provided four fully phased genome assemblies of two different olive flounder individuals using high-fidelity long-read sequencing and their parental short-read sequencing data. We obtained 42-44 Gb of ~15-kb and ~Q30 high-fidelity long reads, and their assembly quality values were ~53. We annotated ~30 K genes, ~170-Mb repetitive sequences, and ~3 M 5-methylcytosine positions for each genome assembly, and established a graph-based draft pan-genome of the olive flounder. We identified 5 M single-nucleotide variants and 100 K structural variants with their genotype information, where ~13% of the variants were possibly fixed in the two Korean individuals. Based on our chromosome-level genome assembly, we also explored chromosome evolution in the Pleuronectiformes family, as reported earlier. Our high-quality genomic resources will contribute to future genomic selection for accelerating the breeding process of the olive flounder.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1193"},"PeriodicalIF":5.8,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11535246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142576855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-02DOI: 10.1038/s41597-024-03951-4
Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, Edson Borin
{"title":"A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.","authors":"Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, Edson Borin","doi":"10.1038/s41597-024-03951-4","DOIUrl":"10.1038/s41597-024-03951-4","url":null,"abstract":"<p><p>Human activity recognition (HAR) using smartphone inertial sensors, like accelerometers and gyroscopes, enhances smartphones' adaptability and user experience. Data distribution from these sensors is affected by several factors including sensor hardware, software, device placement, user demographics, terrain, and more. Most datasets focus on providing variability in user and (sometimes) device placement, limiting domain adaptation and generalization studies. Consequently, models trained on one dataset often perform poorly on others. Despite many publicly available HAR datasets, cross-dataset generalization remains challenging due to data format incompatibilities, such as differences in measurement units, sampling rates, and label encoding. Hence, we introduce the DAGHAR benchmark, a curated collection of datasets for domain adaptation and generalization studies in smartphone-based HAR. We standardized six datasets in terms of accelerometer units, sampling rate, gravity component, activity labels, user partitioning, and time window size, removing trivial biases while preserving intrinsic differences. This enables controlled evaluation of model generalization capabilities. Additionally, we provide baseline performance metrics from state-of-the-art machine learning models, crucial for comprehensive evaluations of generalization in HAR tasks.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1192"},"PeriodicalIF":5.8,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}