NepTrain and NepTrainKit: Automated active learning and visualization toolkit for neuroevolution potentials

IF 3.4 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Chengbing Chen , Yutong Li , Rui Zhao , Zhoulin Liu , Zheyong Fan , Gang Tang , Zhiyong Wang
{"title":"NepTrain and NepTrainKit: Automated active learning and visualization toolkit for neuroevolution potentials","authors":"Chengbing Chen ,&nbsp;Yutong Li ,&nbsp;Rui Zhao ,&nbsp;Zhoulin Liu ,&nbsp;Zheyong Fan ,&nbsp;Gang Tang ,&nbsp;Zhiyong Wang","doi":"10.1016/j.cpc.2025.109859","DOIUrl":null,"url":null,"abstract":"<div><div>As a machine-learned potential, the neuroevolution potential (NEP) method features exceptional computational efficiency and has been successfully applied in materials science. Constructing high-quality training datasets is crucial for developing accurate NEP models. However, the preparation and screening of NEP training datasets remain a bottleneck for broader applications due to their time-consuming, labor-intensive, and resource-intensive nature. In this work, we have developed NepTrain and NepTrainKit, which are dedicated to initializing and managing training datasets to generate high-quality training sets while automating NEP model training. NepTrain is an open-source Python package that features a bond length filtering method to effectively identify and remove non-physical structures from molecular dynamics trajectories, thereby ensuring high-quality training datasets. NepTrainKit is a graphical user interface (GUI) software designed specifically for NEP training datasets, providing functionalities for data editing, visualization, and interactive exploration. It integrates key features such as outlier identification, farthest-point sampling, non-physical structure detection, and configuration type selection. The combination of these tools enables users to process datasets more efficiently and conveniently. Using <span><math><mi>CsPb</mi><msub><mrow><mi>I</mi></mrow><mrow><mn>3</mn></mrow></msub></math></span> as a case study, we demonstrate the complete workflow for training NEP models with NepTrain and further validate the models through materials property predictions. We believe this toolkit will greatly benefit researchers working with machine learning interatomic potentials.</div></div><div><h3>Program summary</h3><div><em>Program Title:</em> NepTrain and NepTrainKit</div><div><em>CPC Library link to program files:</em> <span><span>https://doi.org/10.17632/4s97yg7j9t.1</span><svg><path></path></svg></span></div><div><em>Developer's repository link:</em> <span><span>https://github.com/aboys-cb/NepTrain</span><svg><path></path></svg></span> and <span><span>https://github.com/aboys-cb/NepTrainKit</span><svg><path></path></svg></span></div><div><em>Licensing provisions:</em> GPLv3</div><div><em>Programming language:</em> Python</div><div><em>Nature of problem:</em> The NEP method, a novel machine learning potential model, has demonstrated broad application prospects in materials science due to its excellent computational efficiency. However, the development of accurate NEP models heavily depends on the construction of high-quality training datasets. The preparation and iterative refinement of these datasets largely rely on the researcher's expertise, which poses a significant barrier for beginners attempting to use NEP and similar machine learning potential methods.</div><div><em>Solution method:</em> NepTrain is an open-source Python package that features a bond length filtering method to effectively identify and remove non-physical structures from molecular dynamics trajectories, thereby ensuring high-quality training datasets. NepTrainKit is a graphical user interface (GUI) software designed specifically for NEP training datasets, providing functionalities for data editing, visualization, and interactive exploration. It integrates key features such as outlier identification, farthest-point sampling, non-physical structure detection, and configuration type selection.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"317 ","pages":"Article 109859"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525003613","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

As a machine-learned potential, the neuroevolution potential (NEP) method features exceptional computational efficiency and has been successfully applied in materials science. Constructing high-quality training datasets is crucial for developing accurate NEP models. However, the preparation and screening of NEP training datasets remain a bottleneck for broader applications due to their time-consuming, labor-intensive, and resource-intensive nature. In this work, we have developed NepTrain and NepTrainKit, which are dedicated to initializing and managing training datasets to generate high-quality training sets while automating NEP model training. NepTrain is an open-source Python package that features a bond length filtering method to effectively identify and remove non-physical structures from molecular dynamics trajectories, thereby ensuring high-quality training datasets. NepTrainKit is a graphical user interface (GUI) software designed specifically for NEP training datasets, providing functionalities for data editing, visualization, and interactive exploration. It integrates key features such as outlier identification, farthest-point sampling, non-physical structure detection, and configuration type selection. The combination of these tools enables users to process datasets more efficiently and conveniently. Using CsPbI3 as a case study, we demonstrate the complete workflow for training NEP models with NepTrain and further validate the models through materials property predictions. We believe this toolkit will greatly benefit researchers working with machine learning interatomic potentials.

Program summary

Program Title: NepTrain and NepTrainKit
CPC Library link to program files: https://doi.org/10.17632/4s97yg7j9t.1
Developer's repository link: https://github.com/aboys-cb/NepTrain and https://github.com/aboys-cb/NepTrainKit
Licensing provisions: GPLv3
Programming language: Python
Nature of problem: The NEP method, a novel machine learning potential model, has demonstrated broad application prospects in materials science due to its excellent computational efficiency. However, the development of accurate NEP models heavily depends on the construction of high-quality training datasets. The preparation and iterative refinement of these datasets largely rely on the researcher's expertise, which poses a significant barrier for beginners attempting to use NEP and similar machine learning potential methods.
Solution method: NepTrain is an open-source Python package that features a bond length filtering method to effectively identify and remove non-physical structures from molecular dynamics trajectories, thereby ensuring high-quality training datasets. NepTrainKit is a graphical user interface (GUI) software designed specifically for NEP training datasets, providing functionalities for data editing, visualization, and interactive exploration. It integrates key features such as outlier identification, farthest-point sampling, non-physical structure detection, and configuration type selection.
NepTrain和NepTrainKit:神经进化潜能的自动主动学习和可视化工具包
作为一种机器学习潜能,神经进化潜能(NEP)方法具有卓越的计算效率,并已成功地应用于材料科学。构建高质量的训练数据集对于开发准确的NEP模型至关重要。然而,由于其耗时、劳动密集型和资源密集型的性质,NEP训练数据集的准备和筛选仍然是更广泛应用的瓶颈。在这项工作中,我们开发了NepTrain和NepTrainKit,它们致力于初始化和管理训练数据集,以在自动化NEP模型训练的同时生成高质量的训练集。NepTrain是一个开源的Python包,具有键长过滤方法,可以有效地识别和删除分子动力学轨迹中的非物理结构,从而确保高质量的训练数据集。netrainkit是专门为NEP训练数据集设计的图形用户界面(GUI)软件,提供数据编辑,可视化和交互式探索功能。它集成了异常值识别、最远点采样、非物理结构检测和配置类型选择等关键功能。这些工具的组合使用户能够更有效、更方便地处理数据集。以CsPbI3为例,我们展示了用NepTrain训练NEP模型的完整工作流程,并通过材料属性预测进一步验证模型。我们相信这个工具包将极大地有利于研究机器学习原子间势的研究人员。程序摘要程序标题:NepTrain和NepTrainKitCPC库链接到程序文件:https://doi.org/10.17632/4s97yg7j9t.1Developer's存储库链接:https://github.com/aboys-cb/NepTrain和https://github.com/aboys-cb/NepTrainKitLicensing条款:gplv3编程语言:python问题性质:NEP方法是一种新型的机器学习潜力模型,由于其出色的计算效率,在材料科学中展示了广阔的应用前景。然而,准确的NEP模型的开发在很大程度上依赖于高质量训练数据集的构建。这些数据集的准备和迭代改进在很大程度上依赖于研究人员的专业知识,这对试图使用NEP和类似机器学习潜在方法的初学者构成了重大障碍。解决方法:NepTrain是一个开源的Python包,它具有键长过滤方法,可以有效地识别和去除分子动力学轨迹中的非物理结构,从而确保高质量的训练数据集。netrainkit是专门为NEP训练数据集设计的图形用户界面(GUI)软件,提供数据编辑,可视化和交互式探索功能。它集成了异常值识别、最远点采样、非物理结构检测和配置类型选择等关键功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信