Scientific Data最新文献_第10页

Quantum mechanical dataset of 836k neutral closed-shell molecules with up to 5 heavy atoms from C, N, O, F, Si, P, S, Cl, Br. 836k中性闭壳分子的量子力学数据集，含有多达5个重原子，包括C、N、O、F、Si、P、S、Cl、Br。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-24 DOI: 10.1038/s41597-025-05428-4

Danish Khan, Anouar Benali, Scott Y H Kim, Guido Falk von Rudorff, O Anatole von Lilienfeld

{"title":"Quantum mechanical dataset of 836k neutral closed-shell molecules with up to 5 heavy atoms from C, N, O, F, Si, P, S, Cl, Br.","authors":"Danish Khan, Anouar Benali, Scott Y H Kim, Guido Falk von Rudorff, O Anatole von Lilienfeld","doi":"10.1038/s41597-025-05428-4","DOIUrl":"10.1038/s41597-025-05428-4","url":null,"abstract":"We introduce the Vector-QM24 (VQM24) dataset comprehensively covering all possible neutral closed-shell small organic and inorganic molecules with up to five heavy (p-block) atoms: C, N, O, F, Si, P, S, Cl, Br. All valid stoichiometries, Lewis-rule-consistent graphs, and stable conformers (identified via GFN2-xTB) were enumerated combinatorially, yielding 577k conformational isomers spanning 258k constitutional isomers and 5,599 unique stoichiometries. DFT (ωB97X-D3/cc-pVDZ) optimizations were performed for all, and diffusion quantum Monte Carlo (DMC@PBE0(ccECP/cc-pVQZ)) energies are provided for 10,793 lowest-energy conformers with up to 4 heavy atoms. VQM24 includes structures, vibrational modes, rotational constants, thermodynamic properties (Gibbs free energies, enthalpies, ZPVEs, entropies, heat capacities), and electronic properties such as atomization, electron interaction, exchange-correlation, dispersion energies, multipole moments (dipole to hexadecapole), alchemical potentials, Mulliken charges, and wavefunctions. Machine learning models of atomization energies on this dataset reveal significantly higher complexity than QM9, with none achieving chemical accuracy. VQM24 offers a rigorous, high-fidelity benchmark for evaluating quantum machine learning models.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1551"},"PeriodicalIF":6.9,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460665/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145138640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012. 用人工智能总结1952-2012年美国总统竞选电视广告视频。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-24 DOI: 10.1038/s41597-025-05558-9

Adam Breuer, Bryce J Dietrich, Michael H Crespin, Matthew Butler, J A Pryse, Kosuke Imai

{"title":"Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012.","authors":"Adam Breuer, Bryce J Dietrich, Michael H Crespin, Matthew Butler, J A Pryse, Kosuke Imai","doi":"10.1038/s41597-025-05558-9","DOIUrl":"10.1038/s41597-025-05558-9","url":null,"abstract":"This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation has led many to rely on smaller subsets. We design a large-scale, parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, storyboarding, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1552"},"PeriodicalIF":6.9,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145138628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A regional ocean database for the Coastal China Sea. 中国沿海区域海洋数据库。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-23 DOI: 10.1038/s41597-025-05840-w

Cece Wang, Bei Su, Jun Sun, Xiaoke Hu, Jihua Liu

引用次数: 0

A dataset from a coordinated multi-site laboratory study investigating the Hue-Heat-Hypothesis. 来自协调的多地点实验室研究的数据集，用于调查hue - heat假说。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-19 DOI: 10.1038/s41597-025-05962-1

Mateus Bavaresco, Roberta Jacoby Cureau, Ilaria Pigliautile, Edit Barna, Zsofia Deme Belafi, Lorenzo Belussi, Giorgia Chinazzo, Agnese Chiucchiù, Ludovico Danza, Zhipeng Deng, Bing Dong, Natasha Hansen Gapski, Liége Garlet, Veronica Martins Gnecco, Xingtong Guo, Peiman Pilehchi Ha, Hamidreza Karimian, Roberto Lamberts, Shichao Liu, Brenda da Costa Loeser, Camilla Massucci, Ana Paula Melo, Balázs Vince Nagy, Mohamed M Ouf, Francesco Salamone, Marcel Schweiker, Anna Laura Pisello

{"title":"A dataset from a coordinated multi-site laboratory study investigating the Hue-Heat-Hypothesis.","authors":"Mateus Bavaresco, Roberta Jacoby Cureau, Ilaria Pigliautile, Edit Barna, Zsofia Deme Belafi, Lorenzo Belussi, Giorgia Chinazzo, Agnese Chiucchiù, Ludovico Danza, Zhipeng Deng, Bing Dong, Natasha Hansen Gapski, Liége Garlet, Veronica Martins Gnecco, Xingtong Guo, Peiman Pilehchi Ha, Hamidreza Karimian, Roberto Lamberts, Shichao Liu, Brenda da Costa Loeser, Camilla Massucci, Ana Paula Melo, Balázs Vince Nagy, Mohamed M Ouf, Francesco Salamone, Marcel Schweiker, Anna Laura Pisello","doi":"10.1038/s41597-025-05962-1","DOIUrl":"10.1038/s41597-025-05962-1","url":null,"abstract":"Understanding cross-modal environmental perception is essential for improving occupant well-being and human-centric building design. This paper presents an open-access, multi-site database developed under the IEA-EBC Annex 79 project to test the Hue-Heat Hypothesis (HHH), which hypothesizes that light hue may influence thermal perceptions. The database comprises 543 experimental rounds conducted in eight laboratories across six countries and diverse climate zones, following a shared, rigorously designed protocol. During summer and winter campaigns, participants were exposed to controlled thermal environments and counterbalanced lighting conditions (neutral, reddish, bluish). The database includes detailed metadata on environmental variables, physiological measurements (i.e., heart rate and skin temperature), and self-reported perceptual responses. It also provides standardized technical documentation for each test room, including the detailed experimental protocol and translated survey instruments. All materials are available on the Open Science Framework under the \"Multi-site Hue-Heat-Hypothesis Testing\" repository. This resource supports research into multi-domain human comfort, enabling analysis of cross-modal and combined effects on human perception and physiological reactions.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1549"},"PeriodicalIF":6.9,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparison of Three Anonymization Tools for a Health Fitness Study. 健康健身研究中三种匿名化工具的比较

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-18 DOI: 10.1038/s41597-025-05823-x

Paul Francis, Gregor Jurak, Bojan Leskošek, Karen Otte, Fabian Prasser

引用次数: 0

The Oldenburg Hearing Health Record (OHHR). 奥尔登堡听力健康记录（OHHR）。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-17 DOI: 10.1038/s41597-025-05884-y

Sumbul Jafri, Daniel Berg, Mareike Buhl, Matthias Vormann, Samira Saak, Kirsten C Wagener, Christiane M Thiel, Andrea Hildebrandt, Birger Kollmeier

{"title":"The Oldenburg Hearing Health Record (OHHR).","authors":"Sumbul Jafri, Daniel Berg, Mareike Buhl, Matthias Vormann, Samira Saak, Kirsten C Wagener, Christiane M Thiel, Andrea Hildebrandt, Birger Kollmeier","doi":"10.1038/s41597-025-05884-y","DOIUrl":"10.1038/s41597-025-05884-y","url":null,"abstract":"Hearing health is shaped by both measurable auditory function and the perceived ability to navigate daily life. To fully understand its complexities, objective assessments of hearing and functional performance must be complemented by subjective reports on lived hearing experiences. The Oldenburg Hearing Health Record (OHHR) was developed to unite these measures, offering a comprehensive and open-access resource for hearing health. It contains data from 581 adults aged 18-86 years (255 females; mean age = 67.31 years; SD = 11.93) with varying degrees of hearing loss. Data were collected between 2013 and 2015 at the Hörzentrum Oldenburg in collaboration with Hearing4all. OHHR includes audiometric tests (Pure Tone Audiometry, Loudness Scaling, Speech in Noise tests), self-reports on hearing difficulties, lifestyle, technology use, and cognitive assessments (DemTect, Vocabulary size test). These measurements remain relevant in clinical and research settings. The dataset supports cross-disciplinary analyses linking hearing ability with cognition and quality of life, contributing to personalized hearing healthcare and advancing precision medicine.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1546"},"PeriodicalIF":6.9,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145081362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unified 0.25-degree gridded infrastructure-critical extreme weather for the United States from 1979 to 2100. 统一的0.25度网格基础设施关键极端天气为美国从1979年到2100年。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-12 DOI: 10.1038/s41597-025-05918-5

Tao Sun, Chad Zanocco, June Flora, Aditi Sheshadri, Ram Rajagopal

{"title":"Unified 0.25-degree gridded infrastructure-critical extreme weather for the United States from 1979 to 2100.","authors":"Tao Sun, Chad Zanocco, June Flora, Aditi Sheshadri, Ram Rajagopal","doi":"10.1038/s41597-025-05918-5","DOIUrl":"10.1038/s41597-025-05918-5","url":null,"abstract":"Extreme weather events can severely disrupt critical infrastructure, triggering cascading effects on power, transportation, and essential services. However, standard weather and climate datasets often lack specialized variables necessary for hazard assessments. We present a unified dataset of infrastructure-critical weather and climate variables across the United States at 0.25° resolution, covering daily or sub-daily intervals from 1979 to 2100. The dataset includes temperature, dew point, wind gusts, precipitation partitioned by rain, snow, and freezing rain or ice pellets, lightning, and wildfire metrics. Historical conditions (1979-2023) are synthesized from observations and reanalysis products, while future projections are derived from 14 CMIP6 global climate models (historical, SSP245, and SSP585 experiments). Physically based and data-driven methods are used to estimate variables not directly provided by existing models. By integrating these variables into a single unified dataset, we enable consistent, high-resolution assessments of weather-related infrastructure risks across past and future periods, supporting wide-ranging applications in energy, transportation, water resources, emergency management, and beyond.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1544"},"PeriodicalIF":6.9,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12432142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145055739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chromosome-level assembly of the 400-year-old Goethe's Palm (Chamaerops humilis L.). 400年前歌德棕榈（Chamaerops humilis L.）的染色体水平组装。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-10 DOI: 10.1038/s41597-025-05673-7

Núria Beltran-Sanz, Stefan Prost, Veronica Malavasi, Silvia Moschin, Carola Greve, Tilman Schell, Tomas Morosinotto, Francesco Dal Grande

{"title":"Chromosome-level assembly of the 400-year-old Goethe's Palm (Chamaerops humilis L.).","authors":"Núria Beltran-Sanz, Stefan Prost, Veronica Malavasi, Silvia Moschin, Carola Greve, Tilman Schell, Tomas Morosinotto, Francesco Dal Grande","doi":"10.1038/s41597-025-05673-7","DOIUrl":"10.1038/s41597-025-05673-7","url":null,"abstract":"The rapid decline in global biodiversity highlights the urgent need for conservation efforts, with botanical gardens playing a crucial role in ex situ plant preservation. Monumental plants, such as the 400-year-old Goethe's Palm (Chamaerops humilis L.) at the Padua Botanical Garden serve as vital flagship species with significant ecological and cultural value. In this study, we present the first chromosome-level genome assembly of C. humilis, using PacBio HiFi and Arima Hi-C sequencing technologies. The assembled genome spans 4.41 Gbp with a scaffold N50 length of 195 Mbp and it includes 18 pseudo-chromosomes. Repetitive elements constituted approximately 88% of the genome, with Long Terminal Repeats (LTR) alone comprising 63%. A total of 28,321 protein-coding genes were predicted and annotated. The Goethe's Palm genome assembly is a valuable resource for exploring both its cultural and historical significance, as well as the genetic basis of adaptive traits that allow this palm to thrive in Mediterranean environments.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1542"},"PeriodicalIF":6.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145034125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vessel diameters of 14 basal cerebral arteries assessed in 1000 digital subtraction angiographies. 1000张数字减影血管造影对14条脑基底动脉血管直径的评估。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-08 DOI: 10.1038/s41597-025-05908-7

Till Gumbel, Cindy Richter, Christian Martin, Ulf Nestler

{"title":"Vessel diameters of 14 basal cerebral arteries assessed in 1000 digital subtraction angiographies.","authors":"Till Gumbel, Cindy Richter, Christian Martin, Ulf Nestler","doi":"10.1038/s41597-025-05908-7","DOIUrl":"10.1038/s41597-025-05908-7","url":null,"abstract":"Angiographic normative values for the size of intracranial vessels are difficult to obtain, since they vary with gender, height and weight. Cerebral angiography only is indicated in severe cerebrovascular diseases, which also can affect cerebral vessel diameters, impeding the definition of physiological values. To approximate \"normal\" values, over 1000 contemporary cerebral angiographies from a single neurovascular centre were analyzed. Diameters of 14 basal cerebral arteries, age at examination, gender and underlying disease were noted. The dataset (SPSS 29, IBM) comprises 1010 digital subtraction angiographies. For example, a significant difference (p < 0.001) in the size of the left carotid artery between male (3.23 mm, n = 361, sd = 0.49) and female (3.09 mm, n = 645, sd = 0.52) patients is found. The data can be used to compute intraindividual indices in given diseases, e.g. whether an enlarged diameter of the right media, calculated as ratio to the left media or to the ipsilateral carotid artery, is associated to cerebral aneurysms. The dataset allows for training of machine learning programs, e.g. to predict ischemic stroke or cerebral hemorrhage.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1543"},"PeriodicalIF":6.9,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145024209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda). 第勒尼安树蛙（Hyla sarda）染色体水平基因组组装。

IF 6.9 2区综合性期刊

Scientific Data Pub Date : 2025-09-02 DOI: 10.1038/s41597-025-05760-9

Josephine R Paris, Roberta Bisconti, Andrea Chiocchio, Linelle Abueg, Dominic E Absolon, Tatiana Tilley, Nivesh Jain, Jennifer Balacco, Brian O'Toole, Erich D Jarvis, Giulio Formenti, Daniele Salvi, Daniele Canestrelli

{"title":"Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda).","authors":"Josephine R Paris, Roberta Bisconti, Andrea Chiocchio, Linelle Abueg, Dominic E Absolon, Tatiana Tilley, Nivesh Jain, Jennifer Balacco, Brian O'Toole, Erich D Jarvis, Giulio Formenti, Daniele Salvi, Daniele Canestrelli","doi":"10.1038/s41597-025-05760-9","DOIUrl":"10.1038/s41597-025-05760-9","url":null,"abstract":"The Tyrrhenian tree frog (Hyla sarda) is a small cryptically coloured amphibian found in Corsica, Sardinia, and the Tuscan Archipelago. Investigation into the species' evolutionary history has revealed phenotypic changes triggered by glaciation-induced range expansion, but understanding the genetic basis of this trait variation has been hampered by the lack of a reference genome. To address this, we assembled a chromosome-level genome of Hyla sarda using PacBio HiFi long reads, Bionano optical maps, and Hi-C data. The assembly comprises 13 assembled chromosomes, spanning a total length of 4.15 Gb with a scaffold N50 of 385 Mb, a BUSCO completeness of 94.60%, and a k-mer completeness of 98.30%. Approximately 75% of the genome consists of repetitive elements. We annotated 22,847 protein-coding genes with a BUSCO completeness of 94.60% and an OMArk completeness of 93.74%. This high-quality assembly provides a valuable resource for studying phenotypic evolution and its genomic basis during range expansion, and will assist future investigations into the population and conservation genomics of Hyla sarda.","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1539"},"PeriodicalIF":6.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144967153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0