Asparagus: A toolkit for autonomous, user-guided construction of machine-learned potential energy surfaces

IF 7.2 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Kai Töpfer , Luis Itza Vazquez-Salazar , Markus Meuwly
{"title":"Asparagus: A toolkit for autonomous, user-guided construction of machine-learned potential energy surfaces","authors":"Kai Töpfer ,&nbsp;Luis Itza Vazquez-Salazar ,&nbsp;Markus Meuwly","doi":"10.1016/j.cpc.2024.109446","DOIUrl":null,"url":null,"abstract":"<div><div>With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce <span>Asparagus</span>, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. <span>Asparagus</span> combines capabilities of initial data sampling with interfaces to <em>ab initio</em> calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of <span>Asparagus</span> is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.</div></div><div><h3>Program summary</h3><div><em>Program Title:</em> <span>Asparagus</span></div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/9w9xw7mp2h.1</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/MMunibas/Asparagus</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> MIT</div><div><em>Programming language:</em> Python</div><div><em>Supplementary material:</em> Access to Documentation at <span><span>https://asparagus-bundle.readthedocs.io</span><svg><path></path></svg></span></div><div><em>Nature of problem:</em> Constructing machine-learning (ML) based potential energy surfaces (PESs) for atomistic simulations is a multi-step process that requires a broad knowledge in quantum chemistry, nuclear dynamics and programming. So far, efforts mainly focused on developing and improving ML model architectures. However, there was less effort spent on providing tools for <em>consistent and reproducible workflows</em> that support the construction of ML-PES for a variety of chemical systems for the broader science community.</div><div><em>Solution method:</em> <span>Asparagus</span> is a program package written in Python that provides a streamlined and extensible workflow with a user-friendly command structure to support the construction of ML-PESs. This is achieved by bundling and linking data generation and sampling techniques, data management, model training, testing and evaluation tools into one modular, comprehensive workflow including interfaces to other simulation packages for the application of the ML-PESs. By lowering the entrance barriers especially for new users, <span>Asparagus</span> supports the generation and adjustment of ML-PESs that allow an increased focus on the physico-chemical evaluation of the chemical system or application in molecular dynamics simulations.</div><div><em>Additional comments including restrictions and unusual features:</em> <span>Asparagus</span> is a modular package written in Python providing an underlying structure for further extensions and maintenance. Currently, methods based on message-passing neural network (NN) models using the PyTorch Python package are available. Additions of new models and interfaces to already implemented modules are possible. The NN architecture and hyperparameters are stored in a global configuration module and as a <span>json</span> file for documentation. Except for essential input information, default input parameters are used if not specifically defined otherwise, which allows a quick setup for the construction of a ML-PES but also the fine-tuning for specific needs.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"308 ","pages":"Article 109446"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524003692","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce Asparagus, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. Asparagus combines capabilities of initial data sampling with interfaces to ab initio calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of Asparagus is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.

Program summary

Program Title: Asparagus
CPC Library link to program files: https://doi.org/10.17632/9w9xw7mp2h.1
Developer's repository link: https://github.com/MMunibas/Asparagus
Licensing provisions: MIT
Programming language: Python
Supplementary material: Access to Documentation at https://asparagus-bundle.readthedocs.io
Nature of problem: Constructing machine-learning (ML) based potential energy surfaces (PESs) for atomistic simulations is a multi-step process that requires a broad knowledge in quantum chemistry, nuclear dynamics and programming. So far, efforts mainly focused on developing and improving ML model architectures. However, there was less effort spent on providing tools for consistent and reproducible workflows that support the construction of ML-PES for a variety of chemical systems for the broader science community.
Solution method: Asparagus is a program package written in Python that provides a streamlined and extensible workflow with a user-friendly command structure to support the construction of ML-PESs. This is achieved by bundling and linking data generation and sampling techniques, data management, model training, testing and evaluation tools into one modular, comprehensive workflow including interfaces to other simulation packages for the application of the ML-PESs. By lowering the entrance barriers especially for new users, Asparagus supports the generation and adjustment of ML-PESs that allow an increased focus on the physico-chemical evaluation of the chemical system or application in molecular dynamics simulations.
Additional comments including restrictions and unusual features: Asparagus is a modular package written in Python providing an underlying structure for further extensions and maintenance. Currently, methods based on message-passing neural network (NN) models using the PyTorch Python package are available. Additions of new models and interfaces to already implemented modules are possible. The NN architecture and hyperparameters are stored in a global configuration module and as a json file for documentation. Except for essential input information, default input parameters are used if not specifically defined otherwise, which allows a quick setup for the construction of a ML-PES but also the fine-tuning for specific needs.
芦笋用户引导下自主构建机器学习势能面的工具包
随着机器学习(ML)技术在科学界的应用,构建 ML 势能面(ML-PES)已成为物理和化学领域的标准流程。迄今为止,ML-PES 模型构建的改进都是独立进行的,这给新用户带来了难以克服的初始障碍,并使结果的可重复性变得更加复杂。为了降低广泛使用 ML-PES 的门槛,我们推出了 Asparagus 软件包,该软件包将不同部分整合为一个统一的实施方案,允许在用户指导下自主构建 ML-PES 模型。Asparagus 将初始数据采样功能与 ab initio 计算程序接口、ML 模型训练、模型评估及其在 ASE 或 CHARMM 等其他代码中的应用相结合。该代码的功能在不同的示例中进行了说明,包括小分子动力学、有机金属化合物中反应势的表示以及周期性表面结构上的原子扩散。Asparagus 的模块化框架允许简单实现更多与 ML 相关的方法和模型,从而为用户提供最先进 ML 技术的持续友好访问:AsparagusCPC Library 程序文件链接:https://doi.org/10.17632/9w9xw7mp2h.1Developer's repository 链接:https://github.com/MMunibas/AsparagusLicensing provisions:编程语言:MITPython补充材料:Access to Documentation at https://asparagus-bundle.readthedocs.ioNature 问题:为原子模拟构建基于机器学习(ML)的势能面(PES)是一个多步骤过程,需要量子化学、核动力学和编程方面的广泛知识。迄今为止,人们主要致力于开发和改进 ML 模型架构。然而,在为更广泛的科学界提供一致且可重现的工作流程工具,以支持构建各种化学系统的 ML-PES 方面,所做的努力却较少:Asparagus 是一个用 Python 编写的程序包,它提供了一个简化的、可扩展的工作流程和用户友好的命令结构,以支持 ML-PES 的构建。它将数据生成和采样技术、数据管理、模型训练、测试和评估工具捆绑并连接到一个模块化的综合工作流程中,包括与其他模拟软件包的接口,从而实现 ML-PESs 的应用。Asparagus 降低了入门门槛,特别是对新用户而言,它支持生成和调整 ML-PES,从而使用户更加关注化学系统的物理化学评估或分子动力学模拟应用:Asparagus 是用 Python 编写的模块化软件包,为进一步扩展和维护提供了底层结构。目前,基于使用 PyTorch Python 软件包的消息传递神经网络(NN)模型的方法已经可用。可以添加新的模型,并为已实施的模块提供接口。神经网络架构和超参数存储在一个全局配置模块中,并以 json 文件形式保存。除了必要的输入信息外,如果没有特别定义,则使用默认输入参数,这样不仅可以快速设置构建 ML-PES 模型,还可以根据具体需要进行微调。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信