A machine learning potential construction based on radial distribution function sampling

IF 3.4 3区 化学 Q2 CHEMISTRY, MULTIDISCIPLINARY
Natsuki Watanabe, Yuta Hori, Hiroki Sugisawa, Tomonori Ida, Mitsuo Shoji, Yasuteru Shigeta
{"title":"A machine learning potential construction based on radial distribution function sampling","authors":"Natsuki Watanabe,&nbsp;Yuta Hori,&nbsp;Hiroki Sugisawa,&nbsp;Tomonori Ida,&nbsp;Mitsuo Shoji,&nbsp;Yasuteru Shigeta","doi":"10.1002/jcc.27497","DOIUrl":null,"url":null,"abstract":"<p>Sampling reference data is crucial in machine learning potential (MLP) construction. Inadequate coverage of local configurations in reference data may lead to unphysical behaviors in MLP-based molecular dynamics (MLP-MD) simulations. To address this problem, this study proposes a new on-the-fly reference data sampling method called radial distribution function (RDF)-based data sampling for MLP construction. This method detects and extracts anomalous structures from the trajectories of MLP-MD simulations by focusing on the shapes of RDFs. The detected structures are added to the reference data to improve the accuracy of the MLP. This method allows us to realize a reasonable MLP construction for liquid water with minimal additional data. We prepare data from an H<sub>2</sub>O molecular cluster system and verify whether the constructed MLPs are practical for bulk water systems. MLP-MD simulations without RDF-based data sampling show unphysical behaviors, such as atomic collisions. In contrast, after applying this method, we obtain MLP-MD trajectories with features, such as RDF shapes and angle distributions, that are comparable to those of ab initio MD simulations. Our simulation results demonstrate that the RDF-based data sampling approach is useful for constructing MLPs that are robust to extrapolations from molecular cluster systems to bulk systems without any specialized know-how.</p>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"45 32","pages":"2949-2958"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.27497","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.27497","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Sampling reference data is crucial in machine learning potential (MLP) construction. Inadequate coverage of local configurations in reference data may lead to unphysical behaviors in MLP-based molecular dynamics (MLP-MD) simulations. To address this problem, this study proposes a new on-the-fly reference data sampling method called radial distribution function (RDF)-based data sampling for MLP construction. This method detects and extracts anomalous structures from the trajectories of MLP-MD simulations by focusing on the shapes of RDFs. The detected structures are added to the reference data to improve the accuracy of the MLP. This method allows us to realize a reasonable MLP construction for liquid water with minimal additional data. We prepare data from an H2O molecular cluster system and verify whether the constructed MLPs are practical for bulk water systems. MLP-MD simulations without RDF-based data sampling show unphysical behaviors, such as atomic collisions. In contrast, after applying this method, we obtain MLP-MD trajectories with features, such as RDF shapes and angle distributions, that are comparable to those of ab initio MD simulations. Our simulation results demonstrate that the RDF-based data sampling approach is useful for constructing MLPs that are robust to extrapolations from molecular cluster systems to bulk systems without any specialized know-how.

Abstract Image

基于径向分布函数采样的机器学习潜能构建。
采样参考数据对于机器学习势(MLP)的构建至关重要。在基于 MLP 的分子动力学(MLP-MD)模拟中,参考数据对局部构型的覆盖不足可能会导致非物理行为。为了解决这个问题,本研究提出了一种新的即时参考数据采样方法,称为基于径向分布函数(RDF)的数据采样,用于 MLP 构建。该方法通过关注 RDF 的形状,从 MLP-MD 模拟的轨迹中检测并提取异常结构。检测到的结构被添加到参考数据中,以提高 MLP 的准确性。通过这种方法,我们可以用最少的附加数据为液态水构建合理的 MLP。我们准备了一个 H2O 分子簇系统的数据,并验证了所构建的 MLP 是否适用于大体积水系统。没有基于 RDF 的数据采样的 MLP-MD 模拟显示出非物理行为,如原子碰撞。相比之下,应用这种方法后,我们得到的 MLP-MD 轨迹具有 RDF 形状和角度分布等特征,可与 ab initio MD 模拟相媲美。我们的模拟结果表明,基于 RDF 的数据采样方法对于构建 MLP 非常有用,这种 MLP 可以从分子簇系统稳健地外推到大块系统,而无需任何专业技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.60
自引率
3.30%
发文量
247
审稿时长
1.7 months
期刊介绍: This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信