Robin Curth, Theodor E Röhrkasten, Carolin Müller, Julia Westermayr
{"title":"Surface Hopping Nested Instances Training Set for Excited-state Learning.","authors":"Robin Curth, Theodor E Röhrkasten, Carolin Müller, Julia Westermayr","doi":"10.1038/s41597-025-05443-5","DOIUrl":null,"url":null,"abstract":"<p><p>Theoretical studies of molecular photochemistry and photophysics are essential for understanding fundamental natural processes but rely on computationally demanding quantum chemical calculations. This complexity limits both direct simulations and the development of machine learning (ML) models trained on this data. To address this, we introduce SHNITSEL, a data repository containing 418,870 ab-initio data points of nine organic molecules in their ground and electronically excited states. Each data point includes high-accuracy quantum chemical properties such as energies, forces, and dipole moments in the ground state and electronically excited singlet or triplet states as well as properties that arise from the coupling of electronic states, namely nonadiabatic couplings, transition dipoles, or spin-orbit couplings. Generated with state-of-the-art methods, SHNITSEL provides a robust benchmark for ML models and facilitates the development of ML-based approaches for excited state properties.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1300"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12297575/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05443-5","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Theoretical studies of molecular photochemistry and photophysics are essential for understanding fundamental natural processes but rely on computationally demanding quantum chemical calculations. This complexity limits both direct simulations and the development of machine learning (ML) models trained on this data. To address this, we introduce SHNITSEL, a data repository containing 418,870 ab-initio data points of nine organic molecules in their ground and electronically excited states. Each data point includes high-accuracy quantum chemical properties such as energies, forces, and dipole moments in the ground state and electronically excited singlet or triplet states as well as properties that arise from the coupling of electronic states, namely nonadiabatic couplings, transition dipoles, or spin-orbit couplings. Generated with state-of-the-art methods, SHNITSEL provides a robust benchmark for ML models and facilitates the development of ML-based approaches for excited state properties.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.