Prediction of Activation Energies of Organic Molecules With at Most Seven Non-Hydrogen Atoms Using Quantum-Chemically Assisted ML

IF 3.4 3区 化学 Q2 CHEMISTRY, MULTIDISCIPLINARY
K. G. Kalamatianos, Olga N. Flenga
{"title":"Prediction of Activation Energies of Organic Molecules With at Most Seven Non-Hydrogen Atoms Using Quantum-Chemically Assisted ML","authors":"K. G. Kalamatianos,&nbsp;Olga N. Flenga","doi":"10.1002/jcc.70083","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In this study, a hybrid machine learning (ML) approach is presented for accurately predicting activation energies (<i>E</i><sub>a</sub>) of gas-phase elementary reactions involving organic compounds with up to seven non-hydrogen atoms. Given the importance of activation energies in reaction studies and modeling, ML composite models were created that effectively integrate molecular descriptors with semi-empirical and single energy density functional theory (DFT) calculations. The dataset, containing 300 randomly selected elementary gas-phase reactions, was assembled using accurate DFT (ωB97X-D3/def2-TZVP) values for activation energies <i>E</i><sub>a</sub> from a database alongside semi-empirical computations. For accurate predictions, this approach required the inclusion of both physical organic and geometric/empirical descriptors in the training procedure. The best two ML models demonstrated efficient <i>E</i><sub>a</sub> prediction capability, achieving a mean absolute error (MAE) of 1.314 kcal mol<sup>−1</sup> and <i>R</i><sup>2</sup> of 0.992 (Model 3) and (MAE) of 1.949 kcal mol<sup>−1</sup> and <i>R</i><sup>2</sup> of 0.979 (Model 2) in validation tests. Notably, this performance approaches the threshold of “chemical accuracy” of 1 kcal mol<sup>−1</sup>. Model's 3 robustness was tested across the reaction types present in the dataset, demonstrating its ability in properly predicting activation energies, which is critical for the study and optimization of chemical processes.</p>\n </div>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 8","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70083","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, a hybrid machine learning (ML) approach is presented for accurately predicting activation energies (Ea) of gas-phase elementary reactions involving organic compounds with up to seven non-hydrogen atoms. Given the importance of activation energies in reaction studies and modeling, ML composite models were created that effectively integrate molecular descriptors with semi-empirical and single energy density functional theory (DFT) calculations. The dataset, containing 300 randomly selected elementary gas-phase reactions, was assembled using accurate DFT (ωB97X-D3/def2-TZVP) values for activation energies Ea from a database alongside semi-empirical computations. For accurate predictions, this approach required the inclusion of both physical organic and geometric/empirical descriptors in the training procedure. The best two ML models demonstrated efficient Ea prediction capability, achieving a mean absolute error (MAE) of 1.314 kcal mol−1 and R2 of 0.992 (Model 3) and (MAE) of 1.949 kcal mol−1 and R2 of 0.979 (Model 2) in validation tests. Notably, this performance approaches the threshold of “chemical accuracy” of 1 kcal mol−1. Model's 3 robustness was tested across the reaction types present in the dataset, demonstrating its ability in properly predicting activation energies, which is critical for the study and optimization of chemical processes.

Abstract Image

Abstract Image

用量子化学辅助ML预测最多七个非氢原子的有机分子的活化能
在这项研究中,提出了一种混合机器学习(ML)方法,用于准确预测涉及含有多达七个非氢原子的有机化合物的气相基本反应的活化能(Ea)。考虑到活化能在反应研究和建模中的重要性,我们创建了ML复合模型,有效地将分子描述符与半经验和单能量密度泛函理论(DFT)计算相结合。该数据集包含300个随机选择的基本气相反应,使用数据库中准确的DFT (ωB97X-D3/def2-TZVP)活化能Ea值以及半经验计算进行组装。为了准确预测,这种方法需要在训练过程中同时包含物理有机和几何/经验描述符。最佳的两种ML模型显示出有效的Ea预测能力,验证试验的平均绝对误差(MAE)为1.314 kcal mol−1,R2为0.992(模型3),(MAE)为1.949 kcal mol−1,R2为0.979(模型2)。值得注意的是,这种性能接近1千卡摩尔−1的“化学精度”阈值。对数据集中存在的反应类型进行了模型3的鲁棒性测试,证明了其正确预测活化能的能力,这对化学过程的研究和优化至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.60
自引率
3.30%
发文量
247
审稿时长
1.7 months
期刊介绍: This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信