Scientific DataPub Date : 2024-11-16DOI: 10.1038/s41597-024-04112-3
Nisha Krishnan, Sandhya Sukumaran, V G Vysakh, Wilson Sebastian, Anjaly Jose, Neenu Raj, A Gopalakrishnan
{"title":"De novo transcriptome analysis of the Indian squid Uroteuthis duvaucelii (Orbigny, 1848) from the Indian Ocean.","authors":"Nisha Krishnan, Sandhya Sukumaran, V G Vysakh, Wilson Sebastian, Anjaly Jose, Neenu Raj, A Gopalakrishnan","doi":"10.1038/s41597-024-04112-3","DOIUrl":"10.1038/s41597-024-04112-3","url":null,"abstract":"<p><p>Cephalopods have dominated the oceans for hundreds of millions of years and are unquestionably at the peak of molluscan evolution. The development of the large brain and a well-sophisticated sensory system contributed significantly to its success. Therefore, it is considered the best example of convergent evolution and attracted the attention of scientists from various disciplines of biology. The aim of the present study is to construct a reference transcriptome in the Indian squid Uroteuthis duvaucelii to gain insights into cephalopod evolution and enrich the existing cephalopod database. Around 72 million short Illumina reads were generated from five different tissues, including the brain, eye, gill, heart and gonads, and assembled using the Trinity assembler. About 26230 protein-coding sequences were annotated from the assembled transcripts. The BUSCO completeness of the assembly was 71.71% compared to the Mollusca_Odb10 gene set. KEGG and REACTOME pathway analyzes revealed that U. duvaucelii shares many genes and pathways with higher vertebrates.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1236"},"PeriodicalIF":5.8,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal single-cell RNA sequencing dataset of gastroesophagus development from embryonic to post-natal stages.","authors":"Pon Ganish Prakash, Naveen Kumar, Rajendra Kumar Gurumurthy, Cindrilla Chumduri","doi":"10.1038/s41597-024-04081-7","DOIUrl":"10.1038/s41597-024-04081-7","url":null,"abstract":"<p><p>Gastroesophageal disorders and cancers impose a significant global burden. Particularly, the prevalence of esophageal adenocarcinoma (EAC) has increased dramatically in recent years. Barrett's esophagus, a precursor of EAC, features a unique tissue adaptation at the gastroesophageal squamo-columnar junction (GE-SCJ), where the esophagus meets the stomach. Investigating the evolution of GE-SCJ and understanding dysregulation in its homeostasis are crucial for elucidating cancer pathogenesis. Here, we present the technical quality of the comprehensive single-cell RNA sequencing (scRNA-seq) dataset from mice that captures the transcriptional dynamics during the development of the esophagus, stomach and the GE-SCJ at embryonic, neonatal and adult stages. Through integration with external scRNA-seq datasets and validations using organoid and animal models, we demonstrate the dataset's consistency in identified cell types and transcriptional profiles. This dataset will be a valuable resource for studying developmental patterns and associated signaling networks in the tissue microenvironment. By offering insights into cellular programs during homeostasis, it facilitates the identification of changes leading to conditions like metaplasia and cancer, crucial for developing effective intervention strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1238"},"PeriodicalIF":5.8,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-16DOI: 10.1038/s41597-024-04059-5
Martha Dellar, Gertjan Geerling, Kasper Kok, Peter M van Bodegom, Gerard van der Schrier, Maarten Schrama, Eline Boelee
{"title":"Future land use maps for the Netherlands based on the Dutch One Health Shared Socio-economic Pathways.","authors":"Martha Dellar, Gertjan Geerling, Kasper Kok, Peter M van Bodegom, Gerard van der Schrier, Maarten Schrama, Eline Boelee","doi":"10.1038/s41597-024-04059-5","DOIUrl":"10.1038/s41597-024-04059-5","url":null,"abstract":"<p><p>To enable detailed study of a wide variety of future health challenges, we have created future land use maps for the Netherlands for 2050, based on the Dutch One Health Shared Socio-economic Pathways (SSPs). This was done using the DynaCLUE modelling framework. Future land use is based on altitude, soil properties, groundwater, salinity, flood risk, agricultural land price, distance to transport hubs and climate. We also account for anticipated demand for different land use types, historic land use changes and potential spatial restrictions. These land use maps can be used to model many different health risks to people, animals and the environment, such as disease, water quality and pollution. In addition, the Netherlands can serve as an example for other rapidly urbanising deltas where many of the health risks will be similar.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1237"},"PeriodicalIF":5.8,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569152/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-16DOI: 10.1038/s41597-024-04079-1
Simonas Kecorius, Leizel Madueño, Mario Lovric, Nikolina Racic, Maximilian Schwarz, Josef Cyrys, Juan Andrés Casquero-Vera, Lucas Alados-Arboledas, Sébastien Conil, Jean Sciare, Jakub Ondracek, Anna Gannet Hallar, Francisco J Gómez-Moreno, Raymond Ellul, Adam Kristensson, Mar Sorribas, Nikolaos Kalivitis, Nikolaos Mihalopoulos, Annette Peters, Maria Gini, Konstantinos Eleftheriadis, Stergios Vratolis, Kim Jeongeun, Wolfram Birmili, Benjamin Bergmans, Nina Nikolova, Adelaide Dinoi, Daniele Contini, Angela Marinoni, Andres Alastuey, Tuukka Petäjä, Sergio Rodriguez, David Picard, Benjamin Brem, Max Priestman, David C Green, David C S Beddows, Roy M Harrison, Colin O'Dowd, Darius Ceburnis, Antti Hyvärinen, Bas Henzing, Suzanne Crumeyrolle, Jean-Philippe Putaud, Paolo Laj, Kay Weinhold, Kristina Plauškaitė, Steigvilė Byčenkienė
{"title":"Atmospheric new particle formation identifier using longitudinal global particle number size distribution data.","authors":"Simonas Kecorius, Leizel Madueño, Mario Lovric, Nikolina Racic, Maximilian Schwarz, Josef Cyrys, Juan Andrés Casquero-Vera, Lucas Alados-Arboledas, Sébastien Conil, Jean Sciare, Jakub Ondracek, Anna Gannet Hallar, Francisco J Gómez-Moreno, Raymond Ellul, Adam Kristensson, Mar Sorribas, Nikolaos Kalivitis, Nikolaos Mihalopoulos, Annette Peters, Maria Gini, Konstantinos Eleftheriadis, Stergios Vratolis, Kim Jeongeun, Wolfram Birmili, Benjamin Bergmans, Nina Nikolova, Adelaide Dinoi, Daniele Contini, Angela Marinoni, Andres Alastuey, Tuukka Petäjä, Sergio Rodriguez, David Picard, Benjamin Brem, Max Priestman, David C Green, David C S Beddows, Roy M Harrison, Colin O'Dowd, Darius Ceburnis, Antti Hyvärinen, Bas Henzing, Suzanne Crumeyrolle, Jean-Philippe Putaud, Paolo Laj, Kay Weinhold, Kristina Plauškaitė, Steigvilė Byčenkienė","doi":"10.1038/s41597-024-04079-1","DOIUrl":"10.1038/s41597-024-04079-1","url":null,"abstract":"<p><p>Atmospheric new particle formation (NPF) is a naturally occurring phenomenon, during which high concentrations of sub-10 nm particles are created through gas to particle conversion. The NPF is observed in multiple environments around the world. Although it has observable influence onto annual total and ultrafine particle number concentrations (PNC and UFP, respectively), only limited epidemiological studies have investigated whether these particles are associated with adverse health effects. One plausible reason for this limitation may be related to the absence of NPF identifiers available in UFP and PNC data sets. Until recently, the regional NPF events were usually identified manually from particle number size distribution contour plots. Identification of NPF across multi-annual and multiple station data sets remained a tedious task. In this work, we introduce a regional NPF identifier, created using an automated, machine learning based algorithm. The regional NPF event tag was created for 65 measurement sites globally, covering the period from 1996 to 2023. The discussed data set can be used in future studies related to regional NPF.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1239"},"PeriodicalIF":5.8,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-16DOI: 10.1038/s41597-024-04119-w
Seung Jae Lee, Minjoo Cho, Jinmu Kim, Eunkyung Choi, Soyun Choi, Sangdeok Chung, Jaebong Lee, Jeong-Hoon Kim, Hyun Park
{"title":"Chromosome-level genome assembly and annotation of the Patagonian toothfish Dissostichus eleginoides.","authors":"Seung Jae Lee, Minjoo Cho, Jinmu Kim, Eunkyung Choi, Soyun Choi, Sangdeok Chung, Jaebong Lee, Jeong-Hoon Kim, Hyun Park","doi":"10.1038/s41597-024-04119-w","DOIUrl":"10.1038/s41597-024-04119-w","url":null,"abstract":"<p><p>The Patagonian toothfish (Dissostichus eleginoides) belongs to the Actinopterygii class, and the suborder Notothenioidei, which lives in cold waters in the Southern Hemisphere. We performed assembly and annotation, and we integrated the Illumina short-read sequencing for polishinng, PacBio long-read sequencing for contig-level assembly, and Hi-C sequencing technology to obtain high-quality of chromosome-level genome assembly. The final assembly analysis resulted in a total of 495 scaffolds, a genome size of 844.7 Mbp and an N50 length of 36 Mbp. Among these data, we confirmed 24 scaffolds exceeded 10 Mbp and classified as chromosome-level. The completeness of BUSCO rate was over 97%. A total gene set of 32,224 was identified. Furthermore, we analyzed the presence of AFGP genes, classified into Antarctic and sub-Antarctic categories through phylogenetic analysis. This study provides a useful resource for the genomic analysis of Patagonian toothfish and genetic insights into the comparison with Antarctic fishes.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1240"},"PeriodicalIF":5.8,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11569150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142644766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-15DOI: 10.1038/s41597-024-04105-2
Yangyang Liang, Huijuan Liu, Wenxuan Lu, Jing Li, Ting Fang, Na Gao, Cheng Chen, Xiuxia Zhao, Kun Yang, Haiyang Liu
{"title":"Chromosome-level genome assembly of the smallscale yellowfin (Plagiognathops microlepis).","authors":"Yangyang Liang, Huijuan Liu, Wenxuan Lu, Jing Li, Ting Fang, Na Gao, Cheng Chen, Xiuxia Zhao, Kun Yang, Haiyang Liu","doi":"10.1038/s41597-024-04105-2","DOIUrl":"10.1038/s41597-024-04105-2","url":null,"abstract":"<p><p>The small-scale yellowfin (Plagiognathops microlepis) is a highly valued species in East Asian aquaculture due to its adaptability and high yield. However, the lack of genomic data has impeded genetic research and breeding efforts. In this study, we utilize PacBio Hifi long-read sequencing and Hi-C technologies to construct a highly detailed genome of P. microlepis at the chromosomal level. The assembly encompasses 976.41 Mb, with an exceptional 99.84% distribution across 24 chromosomes. Notably, the contig N50 was 34.41 Mb and scaffold N50 was 38.38 Mb. The completeness of the P. microlepis genome assembly is underscored by a BUSCO score of 98.08%. A total of 25,389 protein-coding genes were identified, with a BUSCO score of 96.98%, and 99.85% of these genes were functionally annotated. Synteny relationships at the chromosome level with Danio rerio and Chanodichthys erythropterus genomes uncover small-scale chromosomal rearrangements. This high-fidelity genome assembly serves as a pivotal resource for forthcoming endeavors such as the genome structure, functional elements, comparative genomics, and evolutionary characteristics of P. microlepis and its relative species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1234"},"PeriodicalIF":5.8,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-15DOI: 10.1038/s41597-024-04041-1
Ivandro Sanches, Victor V Gomes, Carlos Caetano, Lizeth S B Cabrera, Vinicius H Cene, Thomas Beltrame, Wonkyu Lee, Sanghyun Baek, Otávio A B Penatti
{"title":"MIMIC-BP: A curated dataset for blood pressure estimation.","authors":"Ivandro Sanches, Victor V Gomes, Carlos Caetano, Lizeth S B Cabrera, Vinicius H Cene, Thomas Beltrame, Wonkyu Lee, Sanghyun Baek, Otávio A B Penatti","doi":"10.1038/s41597-024-04041-1","DOIUrl":"10.1038/s41597-024-04041-1","url":null,"abstract":"<p><p>Blood pressure (BP) is one of the most prominent indicators of potential cardiovascular disorders. Traditionally, BP measurement relies on inflatable cuffs, which is inconvenient and limit the acquisition of such important health-related information in general population. Based on large amounts of well-collected and annotated data, deep-learning approaches present a generalization potential that arose as an alternative to enable more pervasive approaches. However, most existing work in this area currently uses datasets with limitations, such as lack of subject identification and severe data imbalance that can result in data leakage and algorithm bias. Thus, to offer a more properly curated source of information, we propose a derivative dataset composed of 380 hours of the most common biomedical signals, including arterial blood pressure, photoplethysmography, and electrocardiogram for 1,524 anonymized subjects, each having 30 segments of 30 seconds of those signals. We also validated the proposed dataset through experiments using state-of-the-art deep-learning methods, as we highlight the importance of standardized benchmarks for calibration-free blood pressure estimation scenarios.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1233"},"PeriodicalIF":5.8,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-14DOI: 10.1038/s41597-024-04011-7
Reed Ferber, Allan Brett, Reginaldo K Fukuchi, Blayne Hettinga, Sean T Osis
{"title":"A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running.","authors":"Reed Ferber, Allan Brett, Reginaldo K Fukuchi, Blayne Hettinga, Sean T Osis","doi":"10.1038/s41597-024-04011-7","DOIUrl":"10.1038/s41597-024-04011-7","url":null,"abstract":"<p><p>Quantitative biomechanical gait analysis is an important clinical and research tool for injury and disease diagnosis and treatment. However, one major criticism is that gait analysis laboratories largely operate in isolation and there is a lack of benchmark datasets, which can be used to advance research and statistical methodologies. To address this, we present an open biomechanics dataset of n = 1798 healthy and injured, young and older adults during treadmill walking and/or running at a range of gait speeds. The full dataset is available on Figshare+ and data files are contained within a series of zipped folders with folder names representing the subject ID. Each subject ID folder contains walking and/or running data containing raw marker trajectory data along with metadata for each participant. Five tutorials are also provided, demonstrating aspects such as loading data files, sample analyses of discrete variables, and calculating joint angles from code along with covering more complex topics such as principal component analysis for dimensionality reduction, statistical parametric mapping, and conducting unsupervised clustering.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1232"},"PeriodicalIF":5.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2024-11-14DOI: 10.1038/s41597-024-04063-9
Kun Cai, Liuyin Guan, Shenshen Li, Shuo Zhang, Yang Liu, Yang Liu
{"title":"Full-coverage estimation of CO<sub>2</sub> concentrations in China via multisource satellite data and Deep Forest model.","authors":"Kun Cai, Liuyin Guan, Shenshen Li, Shuo Zhang, Yang Liu, Yang Liu","doi":"10.1038/s41597-024-04063-9","DOIUrl":"10.1038/s41597-024-04063-9","url":null,"abstract":"<p><p>Monitoring China's carbon dioxide (CO<sub>2</sub>) concentration is essential for formulating effective carbon cycle policies to achieve carbon peaking and neutrality. Despite insufficient satellite observation coverage, this study utilizes high-resolution spatiotemporal data from the Orbiting Carbon Observatory 2 (OCO-2), supplemented with various auxiliary datasets, to estimate full-coverage, monthly, column-averaged carbon dioxide (XCO<sub>2</sub>) values across China from 2015 to 2022 at a spatial resolution of 0.05° via the deep forest model. The 10-fold cross-validation results indicate a correlation coefficient (R) of 0.95 and a determination coefficient (R²) of 0.90. Validation against ground-based station data yielded R values of 0.93, and R² values reached 0.81. Further validation from the Greenhouse Gases Observing Satellite (GOSAT) and the Copernicus Atmosphere Monitoring Service Reanalysis dataset (CAMS) produced R² values of 0.87 and 0.80, respectively. During the study period, CO<sub>2</sub> concentrations in China were higher in spring and winter than in summer and autumn, indicating a clear annual increase. The estimates generated by this study could potentially support CO<sub>2</sub> monitoring in China.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1231"},"PeriodicalIF":5.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}