通过在 ænet-PyTorch 上对多组件数据集进行迁移学习来增强机器学习潜力的经济高效战略

arXiv - PHYS - Disordered Systems and Neural Networks Pub Date : 2024-08-23 DOI:arxiv-2408.12939

An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith

{"title":"通过在 ænet-PyTorch 上对多组件数据集进行迁移学习来增强机器学习潜力的经济高效战略","authors":"An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith","doi":"arxiv-2408.12939","DOIUrl":null,"url":null,"abstract":"Machine learning potentials (MLPs) offer efficient and accurate material\nsimulations, but constructing the reference ab initio database remains a\nsignificant challenge, particularly for catalyst-adsorbate systems. Training an\nMLP with a small dataset can lead to overfitting, thus limiting its practical\napplications. This study explores the feasibility of developing computationally\ncost-effective and accurate MLPs for catalyst-adsorbate systems with a limited\nnumber of ab initio references by leveraging a transfer learning strategy from\nsubsets of a comprehensive public database. Using the Open Catalyst Project\n2020 (OC20) -- a dataset closely related to our system of interest -- we\npre-trained MLP models on OC20 subsets using the {\\ae}net-PyTorch framework. We\ncompared several strategies for database subset selection. Our findings\nindicate that MLPs constructed via transfer learning exhibit better\ngeneralizability than those constructed from scratch, as demonstrated by the\nconsistency in the dynamics simulations. Remarkably, transfer learning enhances\nthe stability and accuracy of MLPs for the CuAu/H2O system with approximately\n600 reference data points. This approach achieved excellent extrapolation\nperformance in molecular dynamics (MD) simulations for the larger CuAu/6H2O\nsystem, sustaining up to 250 ps, whereas MLPs without transfer learning lasted\nless than 50 ps. We also examine the potential limitations of this strategy.\nThis work proposes an alternative, cost-effective approach for constructing\nMLPs for the challenging simulation of catalytic systems. Finally, we\nanticipate that this methodology will pave the way for broader applications in\nmaterial science and catalysis research, facilitating more efficient and\naccurate simulations across various systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"292 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A cost-effective strategy of enhancing machine learning potentials by transfer learning from a multicomponent dataset on ænet-PyTorch\",\"authors\":\"An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith\",\"doi\":\"arxiv-2408.12939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning potentials (MLPs) offer efficient and accurate material\\nsimulations, but constructing the reference ab initio database remains a\\nsignificant challenge, particularly for catalyst-adsorbate systems. Training an\\nMLP with a small dataset can lead to overfitting, thus limiting its practical\\napplications. This study explores the feasibility of developing computationally\\ncost-effective and accurate MLPs for catalyst-adsorbate systems with a limited\\nnumber of ab initio references by leveraging a transfer learning strategy from\\nsubsets of a comprehensive public database. Using the Open Catalyst Project\\n2020 (OC20) -- a dataset closely related to our system of interest -- we\\npre-trained MLP models on OC20 subsets using the {\\\\ae}net-PyTorch framework. We\\ncompared several strategies for database subset selection. Our findings\\nindicate that MLPs constructed via transfer learning exhibit better\\ngeneralizability than those constructed from scratch, as demonstrated by the\\nconsistency in the dynamics simulations. Remarkably, transfer learning enhances\\nthe stability and accuracy of MLPs for the CuAu/H2O system with approximately\\n600 reference data points. This approach achieved excellent extrapolation\\nperformance in molecular dynamics (MD) simulations for the larger CuAu/6H2O\\nsystem, sustaining up to 250 ps, whereas MLPs without transfer learning lasted\\nless than 50 ps. We also examine the potential limitations of this strategy.\\nThis work proposes an alternative, cost-effective approach for constructing\\nMLPs for the challenging simulation of catalytic systems. Finally, we\\nanticipate that this methodology will pave the way for broader applications in\\nmaterial science and catalysis research, facilitating more efficient and\\naccurate simulations across various systems.\",\"PeriodicalId\":501066,\"journal\":{\"name\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"volume\":\"292 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.12939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.12939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习势能（MLP）可提供高效、准确的材料模拟，但构建参考的 ab initio 数据库仍是一项重大挑战，尤其是对于催化剂-吸附剂系统而言。用小数据集训练 MLP 可能会导致过度拟合，从而限制其实际应用。本研究通过利用综合公共数据库子集的迁移学习策略，探索了为催化剂吸附剂系统开发计算成本低、准确性高的 MLP 的可行性。利用开放催化剂项目2020（OC20）--一个与我们感兴趣的系统密切相关的数据集--我们使用{\ae}net-PyTorch框架在OC20子集上预先训练了MLP模型。我们比较了几种数据库子集选择策略。我们的研究结果表明，通过迁移学习构建的 MLP 比从头开始构建的 MLP 具有更好的泛化能力，动态模拟的一致性也证明了这一点。值得注意的是，迁移学习增强了具有约 600 个参考数据点的 CuAu/H2O 系统 MLP 的稳定性和准确性。这种方法在更大的 CuAu/6H2O 系统的分子动力学（MD）模拟中实现了出色的外推性能，可持续长达 250 ps，而没有迁移学习的 MLP 持续时间不到 50 ps。我们还研究了这一策略的潜在局限性。这项工作提出了一种替代性的、具有成本效益的方法，用于构建具有挑战性的催化系统模拟的 MLPs。最后，我们预计这种方法将为材料科学和催化研究的更广泛应用铺平道路，促进对各种系统进行更高效、更精确的模拟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A cost-effective strategy of enhancing machine learning potentials by transfer learning from a multicomponent dataset on ænet-PyTorch

Machine learning potentials (MLPs) offer efficient and accurate material simulations, but constructing the reference ab initio database remains a significant challenge, particularly for catalyst-adsorbate systems. Training an MLP with a small dataset can lead to overfitting, thus limiting its practical applications. This study explores the feasibility of developing computationally cost-effective and accurate MLPs for catalyst-adsorbate systems with a limited number of ab initio references by leveraging a transfer learning strategy from subsets of a comprehensive public database. Using the Open Catalyst Project 2020 (OC20) -- a dataset closely related to our system of interest -- we pre-trained MLP models on OC20 subsets using the {\ae}net-PyTorch framework. We compared several strategies for database subset selection. Our findings indicate that MLPs constructed via transfer learning exhibit better generalizability than those constructed from scratch, as demonstrated by the consistency in the dynamics simulations. Remarkably, transfer learning enhances the stability and accuracy of MLPs for the CuAu/H2O system with approximately 600 reference data points. This approach achieved excellent extrapolation performance in molecular dynamics (MD) simulations for the larger CuAu/6H2O system, sustaining up to 250 ps, whereas MLPs without transfer learning lasted less than 50 ps. We also examine the potential limitations of this strategy. This work proposes an alternative, cost-effective approach for constructing MLPs for the challenging simulation of catalytic systems. Finally, we anticipate that this methodology will pave the way for broader applications in material science and catalysis research, facilitating more efficient and accurate simulations across various systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - PHYS - Disordered Systems and Neural Networks

自引率

0.00%

发文量