Balancing Wigner sampling and geometry interpolation for deep neural networks learning photochemical reactions

Li Wang, Zhendong Li, Jingbai Li
{"title":"Balancing Wigner sampling and geometry interpolation for deep neural networks learning photochemical reactions","authors":"Li Wang,&nbsp;Zhendong Li,&nbsp;Jingbai Li","doi":"10.1016/j.aichem.2023.100018","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning photodynamics simulations are revolutionary tools to resolve elusive photochemical reaction mechanisms with time-dependent high-fidelity structure information. Besides the recent advances in neural networks (NNs) potentials, it still lacks a general rule for designing training data for learning photochemical reaction mechanisms with Wigner sampling and geometry interpolation. We present an in-depth investigation of the relationship between the accuracy of the multiple layer NNs and the combinations of training data based on the Wigner sampling and geometry interpolation using model photochemical reactions of the [3]-ladderdiene systems. The NNs trained with Wigner sampling data show underfitting, where the NN errors increase with the structural complexity and diversity. The NNs trained with composite Wigner sampling and geometry interpolation data show one magnitude reduced errors, suggesting an essential role of geometry interpolation in facilitating NNs learning the potential energy surfaces. However, increasing the interpolation steps results in overfitting if the Wigner sampled configuration space is narrowed. Correlating the mean absolute errors (MAE) of the NN predicted energies for the sampled and out-of-sample structures shows an optimal combination ratio of 100:10 between the Wigner sampling structures and geometry interpolation steps for 1000 training data, where the MAE of the sampled structures achieve chemical accuracy while the MAE of the out-of-sample structures is minimized. The NNs trained with the optimally combined data can detect the out-of-sample structures in adaptive sampling with a positive correlation between the maximum standard deviation and MAE of the predicted energies. Collectively, our findings suggest a general rule for designing the training data for ML photodynamics.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000180/pdfft?md5=2cdb8ecc2616508d396111c8c149852d&pid=1-s2.0-S2949747723000180-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747723000180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning photodynamics simulations are revolutionary tools to resolve elusive photochemical reaction mechanisms with time-dependent high-fidelity structure information. Besides the recent advances in neural networks (NNs) potentials, it still lacks a general rule for designing training data for learning photochemical reaction mechanisms with Wigner sampling and geometry interpolation. We present an in-depth investigation of the relationship between the accuracy of the multiple layer NNs and the combinations of training data based on the Wigner sampling and geometry interpolation using model photochemical reactions of the [3]-ladderdiene systems. The NNs trained with Wigner sampling data show underfitting, where the NN errors increase with the structural complexity and diversity. The NNs trained with composite Wigner sampling and geometry interpolation data show one magnitude reduced errors, suggesting an essential role of geometry interpolation in facilitating NNs learning the potential energy surfaces. However, increasing the interpolation steps results in overfitting if the Wigner sampled configuration space is narrowed. Correlating the mean absolute errors (MAE) of the NN predicted energies for the sampled and out-of-sample structures shows an optimal combination ratio of 100:10 between the Wigner sampling structures and geometry interpolation steps for 1000 training data, where the MAE of the sampled structures achieve chemical accuracy while the MAE of the out-of-sample structures is minimized. The NNs trained with the optimally combined data can detect the out-of-sample structures in adaptive sampling with a positive correlation between the maximum standard deviation and MAE of the predicted energies. Collectively, our findings suggest a general rule for designing the training data for ML photodynamics.

平衡Wigner采样和几何插值的深度神经网络学习光化学反应
机器学习光动力学模拟是解决具有时间依赖性高保真结构信息的难以捉摸的光化学反应机制的革命性工具。除了神经网络电位的最新研究进展外,它仍然缺乏一个通用的规则来设计用于学习Wigner采样和几何插值的光化学反应机制的训练数据。我们利用[3]-阶梯二烯系统的模型光化学反应,深入研究了多层神经网络的精度与基于Wigner采样和几何插值的训练数据组合之间的关系。使用Wigner采样数据训练的神经网络出现欠拟合,其中神经网络误差随着结构复杂性和多样性的增加而增加。使用复合Wigner采样和几何插值数据训练的神经网络误差降低了一个数量级,这表明几何插值在促进神经网络学习势能面方面发挥了重要作用。然而,如果Wigner采样配置空间缩小,增加插值步骤会导致过拟合。将样本结构和样本外结构的神经网络预测能量的平均绝对误差(MAE)进行关联,结果表明,对于1000个训练数据,Wigner采样结构和几何插值步骤之间的最佳组合比为100:10,其中样本结构的MAE达到化学精度,而样本外结构的MAE最小。用最优组合的数据训练的神经网络在自适应采样中能够检测出样本外结构,预测能量的最大标准差与MAE之间存在正相关关系。总的来说,我们的发现提出了设计ML光动力学训练数据的一般规则。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial intelligence chemistry
Artificial intelligence chemistry Chemistry (General)
自引率
0.00%
发文量
0
审稿时长
21 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信