基于量子描述符的环磺胺类化合物热危害机器学习模型。

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Michal Dabros*, Hagen Münkler, Florence Yerly, Roger Marti, Michaël Parmentier and Anikó Udvarhelyi*, 
{"title":"基于量子描述符的环磺胺类化合物热危害机器学习模型。","authors":"Michal Dabros*,&nbsp;Hagen Münkler,&nbsp;Florence Yerly,&nbsp;Roger Marti,&nbsp;Michaël Parmentier and Anikó Udvarhelyi*,&nbsp;","doi":"10.1021/acs.jcim.5c01048","DOIUrl":null,"url":null,"abstract":"<p >Cyclic sulfamidates are commonly used building blocks in organic synthesis. Correct classification of their thermal criticality is crucial for the safe use of these compounds in process development and scale-up. In this study, building on our earlier work (Ferrari et al., 2022), we focused on modeling the reaction enthalpy of a family of 5-membered cyclic sulfamidates toward strong bases. The key challenge for the modeling task was the sparse availability of measured reaction enthalpies, with only 29 measurements available. To address this challenge, we used descriptors based on the quantum-chemical properties of the molecules, as they are more closely related to reaction enthalpies than typical cheminformatics-based descriptors. This approach allowed us to avoid relying solely on data-to-fit models and to focus instead on modeling reaction enthalpies using chemistry-aware techniques, which are more appropriate for small data sets. Three models were constructed using the quantum-chemical descriptors: the first one combining Partial Least Squares (PLS) regression with a Genetic Algorithm (GA), the second one based on the Least Absolute Shrinkage and Selection Operator (LASSO) method, and last, a Gaussian Process Regression (GPR) model. The three models achieved coefficients of determination of 0.78, 0.67, and 0.74, respectively. Although the absolute prediction error values were close to 100 J/g, it is noteworthy that all three techniques provided similar results and accurately classified nearly all compounds into their respective thermal criticality classes. This highlights the methodology’s effectiveness in providing a reliable framework for preliminary safety assessment and decision-making in process development.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 16","pages":"8624–8636"},"PeriodicalIF":5.3000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01048","citationCount":"0","resultStr":"{\"title\":\"Quantum Descriptor-Based Machine-Learning Modeling of Thermal Hazard of Cyclic Sulfamidates\",\"authors\":\"Michal Dabros*,&nbsp;Hagen Münkler,&nbsp;Florence Yerly,&nbsp;Roger Marti,&nbsp;Michaël Parmentier and Anikó Udvarhelyi*,&nbsp;\",\"doi\":\"10.1021/acs.jcim.5c01048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Cyclic sulfamidates are commonly used building blocks in organic synthesis. Correct classification of their thermal criticality is crucial for the safe use of these compounds in process development and scale-up. In this study, building on our earlier work (Ferrari et al., 2022), we focused on modeling the reaction enthalpy of a family of 5-membered cyclic sulfamidates toward strong bases. The key challenge for the modeling task was the sparse availability of measured reaction enthalpies, with only 29 measurements available. To address this challenge, we used descriptors based on the quantum-chemical properties of the molecules, as they are more closely related to reaction enthalpies than typical cheminformatics-based descriptors. This approach allowed us to avoid relying solely on data-to-fit models and to focus instead on modeling reaction enthalpies using chemistry-aware techniques, which are more appropriate for small data sets. Three models were constructed using the quantum-chemical descriptors: the first one combining Partial Least Squares (PLS) regression with a Genetic Algorithm (GA), the second one based on the Least Absolute Shrinkage and Selection Operator (LASSO) method, and last, a Gaussian Process Regression (GPR) model. The three models achieved coefficients of determination of 0.78, 0.67, and 0.74, respectively. Although the absolute prediction error values were close to 100 J/g, it is noteworthy that all three techniques provided similar results and accurately classified nearly all compounds into their respective thermal criticality classes. This highlights the methodology’s effectiveness in providing a reliable framework for preliminary safety assessment and decision-making in process development.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 16\",\"pages\":\"8624–8636\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01048\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01048\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c01048","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

环磺胺类化合物是有机合成中常用的基础材料。对其热临界性的正确分类对于在工艺开发和放大中安全使用这些化合物至关重要。在本研究中,基于我们早期的工作(Ferrari et al., 2022),我们重点模拟了一个5元环磺胺酸家族对强碱的反应焓。建模任务的关键挑战是测量反应焓的稀疏可用性,只有29个测量值可用。为了解决这一挑战,我们使用了基于分子量子化学性质的描述符,因为它们与反应焓的关系比典型的基于化学信息学的描述符更密切。这种方法使我们能够避免仅仅依赖于数据拟合模型,而是专注于使用化学感知技术来模拟反应焓,这更适合小数据集。利用量子化学描述符构建了三个模型:第一个是结合偏最小二乘(PLS)回归和遗传算法(GA)的模型,第二个是基于最小绝对收缩和选择算子(LASSO)方法的模型,最后是高斯过程回归(GPR)模型。三种模型的决定系数分别为0.78、0.67和0.74。虽然绝对预测误差值接近100 J/g,但值得注意的是,所有三种技术都提供了相似的结果,并准确地将几乎所有化合物划分到各自的热临界级别。这突出了该方法在为过程开发中的初步安全评估和决策提供可靠框架方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantum Descriptor-Based Machine-Learning Modeling of Thermal Hazard of Cyclic Sulfamidates

Cyclic sulfamidates are commonly used building blocks in organic synthesis. Correct classification of their thermal criticality is crucial for the safe use of these compounds in process development and scale-up. In this study, building on our earlier work (Ferrari et al., 2022), we focused on modeling the reaction enthalpy of a family of 5-membered cyclic sulfamidates toward strong bases. The key challenge for the modeling task was the sparse availability of measured reaction enthalpies, with only 29 measurements available. To address this challenge, we used descriptors based on the quantum-chemical properties of the molecules, as they are more closely related to reaction enthalpies than typical cheminformatics-based descriptors. This approach allowed us to avoid relying solely on data-to-fit models and to focus instead on modeling reaction enthalpies using chemistry-aware techniques, which are more appropriate for small data sets. Three models were constructed using the quantum-chemical descriptors: the first one combining Partial Least Squares (PLS) regression with a Genetic Algorithm (GA), the second one based on the Least Absolute Shrinkage and Selection Operator (LASSO) method, and last, a Gaussian Process Regression (GPR) model. The three models achieved coefficients of determination of 0.78, 0.67, and 0.74, respectively. Although the absolute prediction error values were close to 100 J/g, it is noteworthy that all three techniques provided similar results and accurately classified nearly all compounds into their respective thermal criticality classes. This highlights the methodology’s effectiveness in providing a reliable framework for preliminary safety assessment and decision-making in process development.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信