Kai Töpfer , Luis Itza Vazquez-Salazar , Markus Meuwly
{"title":"Asparagus: A toolkit for autonomous, user-guided construction of machine-learned potential energy surfaces","authors":"Kai Töpfer , Luis Itza Vazquez-Salazar , Markus Meuwly","doi":"10.1016/j.cpc.2024.109446","DOIUrl":null,"url":null,"abstract":"<div><div>With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce <span>Asparagus</span>, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. <span>Asparagus</span> combines capabilities of initial data sampling with interfaces to <em>ab initio</em> calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of <span>Asparagus</span> is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.</div></div><div><h3>Program summary</h3><div><em>Program Title:</em> <span>Asparagus</span></div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/9w9xw7mp2h.1</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/MMunibas/Asparagus</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> MIT</div><div><em>Programming language:</em> Python</div><div><em>Supplementary material:</em> Access to Documentation at <span><span>https://asparagus-bundle.readthedocs.io</span><svg><path></path></svg></span></div><div><em>Nature of problem:</em> Constructing machine-learning (ML) based potential energy surfaces (PESs) for atomistic simulations is a multi-step process that requires a broad knowledge in quantum chemistry, nuclear dynamics and programming. So far, efforts mainly focused on developing and improving ML model architectures. However, there was less effort spent on providing tools for <em>consistent and reproducible workflows</em> that support the construction of ML-PES for a variety of chemical systems for the broader science community.</div><div><em>Solution method:</em> <span>Asparagus</span> is a program package written in Python that provides a streamlined and extensible workflow with a user-friendly command structure to support the construction of ML-PESs. This is achieved by bundling and linking data generation and sampling techniques, data management, model training, testing and evaluation tools into one modular, comprehensive workflow including interfaces to other simulation packages for the application of the ML-PESs. By lowering the entrance barriers especially for new users, <span>Asparagus</span> supports the generation and adjustment of ML-PESs that allow an increased focus on the physico-chemical evaluation of the chemical system or application in molecular dynamics simulations.</div><div><em>Additional comments including restrictions and unusual features:</em> <span>Asparagus</span> is a modular package written in Python providing an underlying structure for further extensions and maintenance. Currently, methods based on message-passing neural network (NN) models using the PyTorch Python package are available. Additions of new models and interfaces to already implemented modules are possible. The NN architecture and hyperparameters are stored in a global configuration module and as a <span>json</span> file for documentation. Except for essential input information, default input parameters are used if not specifically defined otherwise, which allows a quick setup for the construction of a ML-PES but also the fine-tuning for specific needs.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"308 ","pages":"Article 109446"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524003692","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce Asparagus, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. Asparagus combines capabilities of initial data sampling with interfaces to ab initio calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of Asparagus is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.
Program summary
Program Title:Asparagus
CPC Library link to program files:https://doi.org/10.17632/9w9xw7mp2h.1
Supplementary material: Access to Documentation at https://asparagus-bundle.readthedocs.io
Nature of problem: Constructing machine-learning (ML) based potential energy surfaces (PESs) for atomistic simulations is a multi-step process that requires a broad knowledge in quantum chemistry, nuclear dynamics and programming. So far, efforts mainly focused on developing and improving ML model architectures. However, there was less effort spent on providing tools for consistent and reproducible workflows that support the construction of ML-PES for a variety of chemical systems for the broader science community.
Solution method:Asparagus is a program package written in Python that provides a streamlined and extensible workflow with a user-friendly command structure to support the construction of ML-PESs. This is achieved by bundling and linking data generation and sampling techniques, data management, model training, testing and evaluation tools into one modular, comprehensive workflow including interfaces to other simulation packages for the application of the ML-PESs. By lowering the entrance barriers especially for new users, Asparagus supports the generation and adjustment of ML-PESs that allow an increased focus on the physico-chemical evaluation of the chemical system or application in molecular dynamics simulations.
Additional comments including restrictions and unusual features:Asparagus is a modular package written in Python providing an underlying structure for further extensions and maintenance. Currently, methods based on message-passing neural network (NN) models using the PyTorch Python package are available. Additions of new models and interfaces to already implemented modules are possible. The NN architecture and hyperparameters are stored in a global configuration module and as a json file for documentation. Except for essential input information, default input parameters are used if not specifically defined otherwise, which allows a quick setup for the construction of a ML-PES but also the fine-tuning for specific needs.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.