Da Long, Zhihao Li, Xin Xu, Chengchun Liu, Hui Zhang, Xiaoyu Cao, Liulin Yang, Xinchang Wang, Fanyang Mo
{"title":"Computed ECD spectral data for over 10,000 chiral organic small molecules.","authors":"Da Long, Zhihao Li, Xin Xu, Chengchun Liu, Hui Zhang, Xiaoyu Cao, Liulin Yang, Xinchang Wang, Fanyang Mo","doi":"10.1038/s41597-025-05929-2","DOIUrl":null,"url":null,"abstract":"<p><p>Determining the absolute configuration of chiral molecules is of fundamental importance in the fields of natural products chemistry, asymmetric catalysis, and pharmaceutical development. A widely adopted approach involves the comparison of experimental and theoretical electronic circular dichroism (ECD) spectra, which has proven to be a reliable method for absolute configuration assignment. However, the generation of theoretical ECD spectra via time-dependent density functional theory (TD-DFT) remains the rate-limiting step in this workflow, making its acceleration both essential and challenging. Although recent advances in deep learning offer promising strategies for establishing structure-spectrum relationships and expediting theoretical spectrum prediction, the lack of standardized and comprehensive ECD spectral datasets continues to hinder progress. This study presents the Chiral Molecular Circular Dichroism Spectral (CMCDS) dataset, a systematically structural benchmark dataset that addresses the fragmentation of existing ECD data. Characterized by high standardization, scalability, and broad molecular diversity, CMCDS facilitates deep learning applications in ECD analysis and fosters data-driven discovery of chiral molecules.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"1641"},"PeriodicalIF":6.9000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12521594/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-05929-2","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Determining the absolute configuration of chiral molecules is of fundamental importance in the fields of natural products chemistry, asymmetric catalysis, and pharmaceutical development. A widely adopted approach involves the comparison of experimental and theoretical electronic circular dichroism (ECD) spectra, which has proven to be a reliable method for absolute configuration assignment. However, the generation of theoretical ECD spectra via time-dependent density functional theory (TD-DFT) remains the rate-limiting step in this workflow, making its acceleration both essential and challenging. Although recent advances in deep learning offer promising strategies for establishing structure-spectrum relationships and expediting theoretical spectrum prediction, the lack of standardized and comprehensive ECD spectral datasets continues to hinder progress. This study presents the Chiral Molecular Circular Dichroism Spectral (CMCDS) dataset, a systematically structural benchmark dataset that addresses the fragmentation of existing ECD data. Characterized by high standardization, scalability, and broad molecular diversity, CMCDS facilitates deep learning applications in ECD analysis and fosters data-driven discovery of chiral molecules.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.