Scientific DataPub Date : 2025-04-08DOI: 10.1038/s41597-025-04921-0
Eliza V C Alves-Ferreira, Madeline R Galac, Hernan A Lorenzi, Margaret C W Ho, Erick T Tjhin, Ana Popovic, John Parkinson, Michael E Grigg
{"title":"Whole Genome Sequence of the gut commensal protist Tritrichomonas musculus isolated from laboratory mice.","authors":"Eliza V C Alves-Ferreira, Madeline R Galac, Hernan A Lorenzi, Margaret C W Ho, Erick T Tjhin, Ana Popovic, John Parkinson, Michael E Grigg","doi":"10.1038/s41597-025-04921-0","DOIUrl":"https://doi.org/10.1038/s41597-025-04921-0","url":null,"abstract":"<p><p>Tritrichomonas musculus is a commensal protist colonizing the large intestine of laboratory mice. Parasite colonization reshapes the gut microbiome and modulates mucosal immunity. This parasite is refractory to axenic culture. In order to facilitate functional genomic investigations we assembled a 193.49 Mbp high quality reference genome from FACS-purified parasites recovered from monocolonized mice using an integrated approach that combined long-read (PacBio and Oxford Nanopore) sequencing technologies for the draft genome assembly. The genome assembled into 756 contigs and RNA-Seq data was used to support the gene models for 46,131 annotated genes. Of these, 24,215 genes had an InterPro, Enzyme Commission and/or a Gene Ontology annotation. BUSCO analyses established that 53% of the genome annotations matched with available BUSCO genes in the eukaryote_odb10 database. This high quality reference genome will serve as a valuable resource to develop a metabolic and genetic model to grow T. musculus axenically and study genes relevant to its biology, life cycle transmission, and pathogenesis.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"590"},"PeriodicalIF":5.8,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-08DOI: 10.1038/s41597-025-04861-9
Eva Guttmann-Flury, Xinjun Sheng, Xiangyang Zhu
{"title":"Dataset combining EEG, eye-tracking, and high-speed video for ocular activity analysis across BCI paradigms.","authors":"Eva Guttmann-Flury, Xinjun Sheng, Xiangyang Zhu","doi":"10.1038/s41597-025-04861-9","DOIUrl":"https://doi.org/10.1038/s41597-025-04861-9","url":null,"abstract":"<p><p>In Brain-Computer Interface (BCI) research, the detailed study of blinks is crucial. They can be considered as noise, affecting the efficiency and accuracy of decoding users' cognitive states and intentions, or as potential features, providing valuable insights into users' behavior and interaction patterns. We introduce a large dataset capturing electroencephalogram (EEG) signals, eye-tracking, high-speed camera recordings, as well as subjects' mental states and characteristics, to provide a multifactor analysis of eye-related movements. Four paradigms - motor imagery, motor execution, steady-state visually evoked potentials, and P300 spellers - are selected due to their capacity to evoke various sensory-motor responses and potential influence on ocular activity. This online-available dataset contains over 46 hours of data from 31 subjects across 63 sessions, totaling 2520 trials for each of the first three paradigms, and 5670 for P300. This multimodal and multi-paradigms dataset is expected to allow the development of algorithms capable of efficiently handling eye-induced artifacts and enhancing task-specific classification. Furthermore, it offers the opportunity to evaluate the cross-paradigm robustness involving the same participants.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"587"},"PeriodicalIF":5.8,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-08DOI: 10.1038/s41597-025-04927-8
Zhendong Gao, Bo Wang, Ying Lu, Yuqing Chong, Mengfei Li, Jieyun Hong, Jiao Wu, Dongmei Xi, Weidong Deng
{"title":"De novo transcriptome assembly and annotation of the semi-wild Gayal (Bos frontalis).","authors":"Zhendong Gao, Bo Wang, Ying Lu, Yuqing Chong, Mengfei Li, Jieyun Hong, Jiao Wu, Dongmei Xi, Weidong Deng","doi":"10.1038/s41597-025-04927-8","DOIUrl":"https://doi.org/10.1038/s41597-025-04927-8","url":null,"abstract":"<p><p>The Gayal (Bos frontalis) is a rare semi-wild Bovine species that inhabits the harsh environments of Indo-China. Although the origins of the Gayal remain largely enigmatic, addressing the lack of comprehensive transcriptomic data is critical for understanding its genetic and molecular characteristics, which are essential for formulating effective conservation and management plans. In this study, an integrated PacBio Iso-seq and RNA-seq analysis was conducted on samples from 10 different organs and tissues of the Gayal, with each being sequenced in triplicate. The samples analyzed included the heart, liver, spleen, lung, kidney, rumen, abomasum, duodenum, ileum, and rectum. This comprehensive analysis resulted in the identification of 30,760 full-length transcripts ranging from 363 bp to 7,157 bp, with transcript information matched to seven commonly used databases. Gene family clustering and phylogenetic analyses encompassed a comprehensive dataset of 9 Bovine species, including the Gayal. Additionally, long non-coding RNAs (lncRNAs) were identified across all sampled tissues, and comprehensive gene expression profiles and differential expression gene analyses were performed. These findings provide a rich repository of genetic information, laying the foundation for comprehensive functional genomics studies and paving the way for deeper insights into the molecular mechanisms of the Gayal, thereby advancing our understanding of its transcriptome architecture and offering crucial data for conservation efforts and practical applications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"589"},"PeriodicalIF":5.8,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-08DOI: 10.1038/s41597-025-04940-x
V Bala Chaudhary, Liam F Nokes, Jennifer B González, Peri O Cooper, Anne M Katula, Emma C Mares, Smriti Pehim Limbu, Jannetta N Robinson, Carlos A Aguilar-Trigueros
{"title":"TraitAM, a global spore trait database for arbuscular mycorrhizal fungi.","authors":"V Bala Chaudhary, Liam F Nokes, Jennifer B González, Peri O Cooper, Anne M Katula, Emma C Mares, Smriti Pehim Limbu, Jannetta N Robinson, Carlos A Aguilar-Trigueros","doi":"10.1038/s41597-025-04940-x","DOIUrl":"https://doi.org/10.1038/s41597-025-04940-x","url":null,"abstract":"<p><p>Knowledge regarding organismal traits supports a better understanding of the relationship between form and function and can be used to predict the consequences of environmental stressors on ecological and evolutionary processes. Most plants on Earth form symbioses with mycorrhizal fungi, but our ability to make trait-based inferences for these fungi is limited due to a lack of publicly available trait data. Here, we present TraitAM, a comprehensive database of multiple spore traits for all described species of the most common group of mycorrhizal fungi, the arbuscular mycorrhizal (AM) fungi (subphylum Glomeromycotina). Trait data for 344 species were mined from original species descriptions and used to calculate newly developed fungal trait metrics that can be employed to explore both intra- and inter-specific variation in traits. TraitAM also includes an updated phylogenetic tree that can be used to conduct phylogenetically-informed multivariate analyses of AM fungal traits. TraitAM will aid our further understanding of the biology, ecology, and evolution of these globally widespread, symbiotic fungi.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"588"},"PeriodicalIF":5.8,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-07DOI: 10.1038/s41597-025-04936-7
Huiliang Li, Xin Gao, Yongcheng Zhao, Jie Zhou, Zihao Hu, Zhuo Chen, Zuowei Yang, Shengyu Li
{"title":"A comprehensive grain-size database of surface sediments from the Taklamakan Desert.","authors":"Huiliang Li, Xin Gao, Yongcheng Zhao, Jie Zhou, Zihao Hu, Zhuo Chen, Zuowei Yang, Shengyu Li","doi":"10.1038/s41597-025-04936-7","DOIUrl":"10.1038/s41597-025-04936-7","url":null,"abstract":"<p><p>This study compiles the most comprehensive open-access surface sediment grain-size database (n = 596 samples) spanning the entire Taklamakan Desert, obtained through systematic field sampling and laser diffraction analysis. It provides essential data for understanding the desert formation, evolution, sand sources, and the restoration of aeolian environments. By analyzing key sediment parameters (mean grain size, sorting, skewness, kurtosis) and particle compositions, the dataset reveals sediment transport dynamics and depositional processes critical for understanding desert formation, sand provenance, and aeolian environmental reconstruction. The quantitative characterization of sediment texture and sorting mechanisms provides foundational data for investigating regional dust emissions, wind erosion patterns, and sediment transport capacities. While the primary focus is on the Taklamakan Desert, the methodology and dataset apply to other arid regions, making it a valuable resource for comparative desert studies. It is an indispensable tool for researchers investigating desert landscapes and addressing environmental challenges related to desertification and aeolian processes.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"585"},"PeriodicalIF":5.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-07DOI: 10.1038/s41597-025-04933-w
Yan Liu, Changqing Song, Sijing Ye, Jiaying Lv, Peichao Gao
{"title":"Daily Max Simplified Wet-Bulb Globe Temperature and its Climate Networks for Teleconnection Study, 1940-2022.","authors":"Yan Liu, Changqing Song, Sijing Ye, Jiaying Lv, Peichao Gao","doi":"10.1038/s41597-025-04933-w","DOIUrl":"10.1038/s41597-025-04933-w","url":null,"abstract":"<p><p>As global warming intensifies, extreme heat events, especially those occurring simultaneously or sequentially in multiple regions, are becoming more frequent. This highlights the growing need to analyze heat stress from the perspectives of human health and spatiotemporal correlations. Wet-Bulb Globe Temperature (WBGT) is a well-established heat stress indicator closely linked to human health. However, its reliance on specialized measurements and resource-intensive computations limits its widespread use, particularly for researchers without an earth sciences background. To address this, we adopted a simplified WBGT (sWBGT), which effectively simulates human cooling through sweating, to generate a global 2° resolution dataset of daily maximum sWBGT from 1940 to 2022. This dataset fills a critical gap in long-term, global-scale heat stress data. Additionally, we employed climate network methods to innovatively explore teleconnections of extreme heat events, providing a tool to reveal their spatiotemporal relationships and supporting the development of effective health protection strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"584"},"PeriodicalIF":5.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-07DOI: 10.1038/s41597-025-04918-9
Bruno Calderón-Hernández, Horacio Larreguy, John Marshall, José Luis Pérez-Castellanos
{"title":"Electoral precinct-level database for Mexican municipal elections.","authors":"Bruno Calderón-Hernández, Horacio Larreguy, John Marshall, José Luis Pérez-Castellanos","doi":"10.1038/s41597-025-04918-9","DOIUrl":"10.1038/s41597-025-04918-9","url":null,"abstract":"<p><p>This paper introduces a database of electoral precinct-level election returns for Mexican municipal elections between 1994 and 2019. This database includes: (i) electoral precinct-level votes for each electoral coalition, the coalitions of the incumbent mayor and incumbent state governor, and the four most popular political parties; (ii) electoral precinct-level valid and total votes, the number of registered voters, and turnout; (iii) the partisan composition and municipal-level votes of the incumbent and runner-up electoral coalitions from the previous election; and (iv) the partisan composition of the state-level incumbent governor. This paper outlines the organization of this data, its sources, and key variables, and describes the processes used to standardize the data. This database has the potential to support the cross-sectional and longitudinal study of local Mexican elections over two decades using fine-grained precinct-level electoral returns that enable panel and regression discontinuity analyses.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"582"},"PeriodicalIF":5.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A global near real-time dataset of Microwave Integrated Drought Index from the Fengyun-3 satellites.","authors":"Anzhi Zhang, Hao Gao, Ronghan Xu, Xiaoqing Li, Huichen Zhao, Gensuo Jia","doi":"10.1038/s41597-025-04935-8","DOIUrl":"10.1038/s41597-025-04935-8","url":null,"abstract":"<p><p>Droughts have become more frequent and intense with increasing climate warming, posing widespread risks on ecosystem, agricultural, and water resources, therefore effective and timely drought monitoring is critical to drought assessment, management, and mitigation. Here, we presented a global monthly and ten-day drought dataset of the Fengyun-3 Microwave Integrated Drought Index (FY-3 MIDI) by integrating the inconsistency corrected FY-3B/C/D derived microwave precipitation, soil moisture, and land surface temperature with optimal weights from June 2014 to present. The dataset was evaluated and validated against the Standardized Precipitation Evapotranspiration Index, the Self-calibrating Palmer Drought Severity Index, and the non-FY MIDI at 0.25°. The FY-3 MIDI can effectively observe drought condition and characteristics as captured by the reference datasets, and it was reliable in monitoring meteorological drought with the ability to work in all-weather condition. Based on the operational Fengyun-3 series satellite, it provided valuable operational service in near real-time on a monthly and ten-day time scale, guaranteeing present and future continuous applications to support global and regional drought monitoring and assessment.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"583"},"PeriodicalIF":5.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The first urban open space product of global 169 megacities using remote sensing and geospatial data.","authors":"Runyu Fan, Lizhe Wang, Zijian Xu, Hongyang Niu, Jiajun Chen, Zhaoying Zhou, Wenyue Li, Haoyu Wang, Yuyue Sun, Ruyi Feng","doi":"10.1038/s41597-025-04924-x","DOIUrl":"10.1038/s41597-025-04924-x","url":null,"abstract":"<p><p>Urban open space (UOS) plays an important environmental role, especially in areas characterized by intense social and economic activity. However, the high interclass similarities, complex surroundings, and scale variations of UOS lead to unsatisfactory UOS mapping performance, and UOS mapping products for major cities around the world are lacking. To fill this gap, we used a deep learning-based method based on a tiny-manual annotation strategy and optical remote sensing imagery to produce a 1.19 m resolution UOS map of 169 megacities, namely the OpenspaceGlobal product. We generated the OpenspaceGlobal product with five urban open space categories. To obtain the final OpenspaceGlobal product, we processed over 8.5 TB of remote sensing images and nearly 90 million polygons in crowdsourced geospatial data. The validation results showed that the OpenspaceGlobal product had an overall accuracy of 79.13 % and a kappa coefficient of 73.47 %. The OpenspaceGlobal product can promote a better understanding of human-made space surfaces in major cities around the world.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"586"},"PeriodicalIF":5.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific DataPub Date : 2025-04-05DOI: 10.1038/s41597-025-04753-y
Qiwei Lin, Derek Ouyang, Cameron Guage, Isabel O Gallegos, Jacob Goldin, Daniel E Ho
{"title":"Enabling disaggregation of Asian American subgroups: a dataset of Wikidata names for disparity estimation.","authors":"Qiwei Lin, Derek Ouyang, Cameron Guage, Isabel O Gallegos, Jacob Goldin, Daniel E Ho","doi":"10.1038/s41597-025-04753-y","DOIUrl":"10.1038/s41597-025-04753-y","url":null,"abstract":"<p><p>Decades of research and advocacy have underscored the imperative of surfacing - as the first step towards mitigating - racial disparities, including among subgroups historically bundled into aggregated categories. Recent U.S. federal regulations have required increasingly disaggregated race reporting, but major implementation barriers mean that, in practice, reported race data continues to remain inadequate. While imputation methods have enabled disparity assessments in many research and policy settings lacking reported race, the leading name algorithms cannot recover disaggregated categories, given the same lack of disaggregated data from administrative sources to inform algorithm design. Leveraging a Wikidata sample of over 300,000 individuals from six Asian countries, we extract frequencies of 25,876 first names and 18,703 surnames which can be used as proxies for U.S. name-race distributions among six major Asian subgroups: Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese. We show that these data, when combined with public geography-race distributions to predict subgroup membership, outperform existing deterministic name lists in key prediction settings, and enable critical Asian disparity assessments.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"580"},"PeriodicalIF":5.8,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143788684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}