哈米特启发的数据高效产品基线Δ-ML在化学领域。

IF 5.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
V. Diana Rakotonirina, , , Marco Bragato, , , Guido Falk von Rudorff, , and , O. Anatole von Lilienfeld*, 
{"title":"哈米特启发的数据高效产品基线Δ-ML在化学领域。","authors":"V. Diana Rakotonirina,&nbsp;, ,&nbsp;Marco Bragato,&nbsp;, ,&nbsp;Guido Falk von Rudorff,&nbsp;, and ,&nbsp;O. Anatole von Lilienfeld*,&nbsp;","doi":"10.1021/acs.jctc.5c00848","DOIUrl":null,"url":null,"abstract":"<p >Data-hungry machine learning methods have become a new standard to efficiently navigate chemical compound space for molecular and materials design and discovery. Due to the severe scarcity and cost of high-quality experimental or synthetic simulated training data, however, data-acquisition costs can be considerable. Relying on reasonably accurate approximate legacy baseline labels with low computational complexity represents one of the most effective strategies to curb data-needs, e.g. through Δ-, transfer-, or multifidelity learning. A surprisingly effective and data-efficient baseline model is presented in the form of a generic coarse-graining Hammett-inspired product (HIP) Ansatz, generalizing the empirical Hammett equation toward arbitrary systems and properties. Numerical evidence for the applicability of HIP includes solvation free energies of molecules, formation energies of quaternary elpasolite crystals, carbon adsorption energies on heterogeneous catalytic surfaces, HOMO–LUMO gaps of metallorganic complexes, activation energies for S<sub>N</sub>2 reactions, and catalyst–substrate binding energies in cross-coupling reactions. After calibration on the same training sets, HIP yields an effective baseline for improved Δ-machine learning models with superior data-efficiency when compared to previously introduced specialized domain-specific models.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"21 19","pages":"9844–9852"},"PeriodicalIF":5.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hammett-Inspired Product Baseline for Data-Efficient Δ-ML in Chemical Space\",\"authors\":\"V. Diana Rakotonirina,&nbsp;, ,&nbsp;Marco Bragato,&nbsp;, ,&nbsp;Guido Falk von Rudorff,&nbsp;, and ,&nbsp;O. Anatole von Lilienfeld*,&nbsp;\",\"doi\":\"10.1021/acs.jctc.5c00848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Data-hungry machine learning methods have become a new standard to efficiently navigate chemical compound space for molecular and materials design and discovery. Due to the severe scarcity and cost of high-quality experimental or synthetic simulated training data, however, data-acquisition costs can be considerable. Relying on reasonably accurate approximate legacy baseline labels with low computational complexity represents one of the most effective strategies to curb data-needs, e.g. through Δ-, transfer-, or multifidelity learning. A surprisingly effective and data-efficient baseline model is presented in the form of a generic coarse-graining Hammett-inspired product (HIP) Ansatz, generalizing the empirical Hammett equation toward arbitrary systems and properties. Numerical evidence for the applicability of HIP includes solvation free energies of molecules, formation energies of quaternary elpasolite crystals, carbon adsorption energies on heterogeneous catalytic surfaces, HOMO–LUMO gaps of metallorganic complexes, activation energies for S<sub>N</sub>2 reactions, and catalyst–substrate binding energies in cross-coupling reactions. After calibration on the same training sets, HIP yields an effective baseline for improved Δ-machine learning models with superior data-efficiency when compared to previously introduced specialized domain-specific models.</p>\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\"21 19\",\"pages\":\"9844–9852\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jctc.5c00848\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jctc.5c00848","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

数据饥渴的机器学习方法已经成为有效导航化合物空间的新标准,用于分子和材料的设计和发现。然而,由于高质量的实验或合成模拟训练数据的严重稀缺和成本,数据采集成本可能相当高。依靠合理准确的近似遗留基线标签,具有较低的计算复杂性,是抑制数据需求的最有效策略之一,例如通过Δ-,转移-或多保真学习。一个惊人的有效和数据效率的基线模型以一般的粗粒度哈米特启发产品(HIP) Ansatz的形式提出,将经验哈米特方程推广到任意系统和性质。HIP适用性的数值证据包括分子的溶剂化自由能、季相斜沸石晶体的形成能、非均相催化表面的碳吸附能、金属有机配合物的HOMO-LUMO间隙、SN2反应的活化能以及交叉偶联反应中催化剂-底物的结合能。在相同的训练集上进行校准后,与之前引入的专门领域特定模型相比,HIP为改进的Δ-machine学习模型提供了有效的基线,具有更高的数据效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Hammett-Inspired Product Baseline for Data-Efficient Δ-ML in Chemical Space

Hammett-Inspired Product Baseline for Data-Efficient Δ-ML in Chemical Space

Data-hungry machine learning methods have become a new standard to efficiently navigate chemical compound space for molecular and materials design and discovery. Due to the severe scarcity and cost of high-quality experimental or synthetic simulated training data, however, data-acquisition costs can be considerable. Relying on reasonably accurate approximate legacy baseline labels with low computational complexity represents one of the most effective strategies to curb data-needs, e.g. through Δ-, transfer-, or multifidelity learning. A surprisingly effective and data-efficient baseline model is presented in the form of a generic coarse-graining Hammett-inspired product (HIP) Ansatz, generalizing the empirical Hammett equation toward arbitrary systems and properties. Numerical evidence for the applicability of HIP includes solvation free energies of molecules, formation energies of quaternary elpasolite crystals, carbon adsorption energies on heterogeneous catalytic surfaces, HOMO–LUMO gaps of metallorganic complexes, activation energies for SN2 reactions, and catalyst–substrate binding energies in cross-coupling reactions. After calibration on the same training sets, HIP yields an effective baseline for improved Δ-machine learning models with superior data-efficiency when compared to previously introduced specialized domain-specific models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信