用自动系统生成和策略抽样加速训练机器学习潜力的数据集人口。

IF 5.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
Alberto Pacini*, Mauro Ferrario and Maria Clelia Righi*, 
{"title":"用自动系统生成和策略抽样加速训练机器学习潜力的数据集人口。","authors":"Alberto Pacini*,&nbsp;Mauro Ferrario and Maria Clelia Righi*,&nbsp;","doi":"10.1021/acs.jctc.5c00616","DOIUrl":null,"url":null,"abstract":"<p >Machine Learning Interatomic Potentials (MLIPs) offer a powerful way to overcome the limitations of <i>ab initio</i> and classical molecular dynamics simulations. However, a major challenge is the generation of high-quality training data sets, which typically require extensive <i>ab initio</i> calculations and intensive user intervention. Here, we introduce Strategic Configuration Sampling (SCS), an active learning framework to construct compact and comprehensive data sets for MLIP training. SCS introduces the usage of <i>workflows for the automated generation and exploration of systems</i>, collections of MD simulations where geometries and run conditions are set up automatically based on high-level, user defined inputs. To explore nontrivial atomic environments, initial geometries can be assembled dynamically via <i>collaging</i> of structures harvested from preceding runs. Multiple <i>automated exploration workflows</i> can be run in parallel, each with its own resource budget according to the computational complexity of each system. Besides leveraging the MLIP models trained iteratively, SCS also incorporates pretrained models to steer the exploration MD, thereby eliminating the need for an initial data set. By integrating widely used software, SCS provides a fully open-source, automatic, active learning framework for the generation of data sets in a high-throughput fashion. Case studies demonstrate its versatility and effectiveness to accelerate the deployment of MLIP in diverse materials science applications.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":"21 14","pages":"7102–7110"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jctc.5c00616","citationCount":"0","resultStr":"{\"title\":\"Accelerating Data Set Population for Training Machine Learning Potentials with Automated System Generation and Strategic Sampling\",\"authors\":\"Alberto Pacini*,&nbsp;Mauro Ferrario and Maria Clelia Righi*,&nbsp;\",\"doi\":\"10.1021/acs.jctc.5c00616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Machine Learning Interatomic Potentials (MLIPs) offer a powerful way to overcome the limitations of <i>ab initio</i> and classical molecular dynamics simulations. However, a major challenge is the generation of high-quality training data sets, which typically require extensive <i>ab initio</i> calculations and intensive user intervention. Here, we introduce Strategic Configuration Sampling (SCS), an active learning framework to construct compact and comprehensive data sets for MLIP training. SCS introduces the usage of <i>workflows for the automated generation and exploration of systems</i>, collections of MD simulations where geometries and run conditions are set up automatically based on high-level, user defined inputs. To explore nontrivial atomic environments, initial geometries can be assembled dynamically via <i>collaging</i> of structures harvested from preceding runs. Multiple <i>automated exploration workflows</i> can be run in parallel, each with its own resource budget according to the computational complexity of each system. Besides leveraging the MLIP models trained iteratively, SCS also incorporates pretrained models to steer the exploration MD, thereby eliminating the need for an initial data set. By integrating widely used software, SCS provides a fully open-source, automatic, active learning framework for the generation of data sets in a high-throughput fashion. Case studies demonstrate its versatility and effectiveness to accelerate the deployment of MLIP in diverse materials science applications.</p>\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\"21 14\",\"pages\":\"7102–7110\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/pdf/10.1021/acs.jctc.5c00616\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jctc.5c00616\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jctc.5c00616","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

机器学习原子间势(MLIPs)为克服从头算和经典分子动力学模拟的局限性提供了一种强大的方法。然而,一个主要的挑战是生成高质量的训练数据集,这通常需要大量的从头计算和大量的用户干预。在这里,我们引入了战略配置采样(SCS),这是一个主动学习框架,用于构建紧凑而全面的MLIP训练数据集。SCS引入了工作流的使用,用于自动生成和探索系统,MD模拟的集合,其中几何形状和运行条件是基于高级用户定义的输入自动设置的。为了探索不平凡的原子环境,可以通过拼贴从之前的运行中获得的结构来动态地组装初始几何形状。多个自动化勘探工作流可以并行运行,每个工作流根据每个系统的计算复杂性有自己的资源预算。除了利用迭代训练的MLIP模型外,SCS还结合了预训练模型来指导勘探MD,从而消除了对初始数据集的需求。通过集成广泛使用的软件,SCS提供了一个完全开源的、自动的、主动的学习框架,以高通量的方式生成数据集。案例研究证明了它的多功能性和有效性,加速了MLIP在不同材料科学应用中的部署。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Accelerating Data Set Population for Training Machine Learning Potentials with Automated System Generation and Strategic Sampling

Machine Learning Interatomic Potentials (MLIPs) offer a powerful way to overcome the limitations of ab initio and classical molecular dynamics simulations. However, a major challenge is the generation of high-quality training data sets, which typically require extensive ab initio calculations and intensive user intervention. Here, we introduce Strategic Configuration Sampling (SCS), an active learning framework to construct compact and comprehensive data sets for MLIP training. SCS introduces the usage of workflows for the automated generation and exploration of systems, collections of MD simulations where geometries and run conditions are set up automatically based on high-level, user defined inputs. To explore nontrivial atomic environments, initial geometries can be assembled dynamically via collaging of structures harvested from preceding runs. Multiple automated exploration workflows can be run in parallel, each with its own resource budget according to the computational complexity of each system. Besides leveraging the MLIP models trained iteratively, SCS also incorporates pretrained models to steer the exploration MD, thereby eliminating the need for an initial data set. By integrating widely used software, SCS provides a fully open-source, automatic, active learning framework for the generation of data sets in a high-throughput fashion. Case studies demonstrate its versatility and effectiveness to accelerate the deployment of MLIP in diverse materials science applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信