Miao-Jiong Tang, Tian-Cheng Zhu, Shuo-Qing Zhang, Xin Hong
{"title":"QM9star, two Million DFT-computed Equilibrium Structures for Ions and Radicals with Atomic Information.","authors":"Miao-Jiong Tang, Tian-Cheng Zhu, Shuo-Qing Zhang, Xin Hong","doi":"10.1038/s41597-024-03933-6","DOIUrl":null,"url":null,"abstract":"<p><p>Ions and radicals serve as key intermediates in molecular transformation, with their chemical properties being essential for understanding and predicting reaction reactivity and selectivity. In this data descriptor, we report a quantum chemical dataset named QM9star, comprising cations, anions, and radicals. This dataset is derived from the molecular structures of the QM9 dataset, created by removing terminal hydrogens followed by optimization using B3LYP-D3(BJ)/6-311 + G(d,p) level of density functional theory. The QM9star dataset includes approximately 1.9 million cations, anions, and radicals, along with 120 kilo neutral molecules prior to hydrogen removal. Each entry encompasses both molecular and atomic information: representative global properties include orbital energies, vibrational frequencies, etc., while local properties cover aspects such as charges and spin densities at each atomic site. The QM9star dataset not only serves as a comprehensive source of quantum chemical information for intermediates but also offers insights into the principle of atomic property distribution. We anticipate that these data will aid in machine learning studies related to chemical intermediates and contribute to the molecular representation learning.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1158"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494049/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-03933-6","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Ions and radicals serve as key intermediates in molecular transformation, with their chemical properties being essential for understanding and predicting reaction reactivity and selectivity. In this data descriptor, we report a quantum chemical dataset named QM9star, comprising cations, anions, and radicals. This dataset is derived from the molecular structures of the QM9 dataset, created by removing terminal hydrogens followed by optimization using B3LYP-D3(BJ)/6-311 + G(d,p) level of density functional theory. The QM9star dataset includes approximately 1.9 million cations, anions, and radicals, along with 120 kilo neutral molecules prior to hydrogen removal. Each entry encompasses both molecular and atomic information: representative global properties include orbital energies, vibrational frequencies, etc., while local properties cover aspects such as charges and spin densities at each atomic site. The QM9star dataset not only serves as a comprehensive source of quantum chemical information for intermediates but also offers insights into the principle of atomic property distribution. We anticipate that these data will aid in machine learning studies related to chemical intermediates and contribute to the molecular representation learning.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.