An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith
{"title":"通过在 ænet-PyTorch 上对多组件数据集进行迁移学习来增强机器学习潜力的经济高效战略","authors":"An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith","doi":"arxiv-2408.12939","DOIUrl":null,"url":null,"abstract":"Machine learning potentials (MLPs) offer efficient and accurate material\nsimulations, but constructing the reference ab initio database remains a\nsignificant challenge, particularly for catalyst-adsorbate systems. Training an\nMLP with a small dataset can lead to overfitting, thus limiting its practical\napplications. This study explores the feasibility of developing computationally\ncost-effective and accurate MLPs for catalyst-adsorbate systems with a limited\nnumber of ab initio references by leveraging a transfer learning strategy from\nsubsets of a comprehensive public database. Using the Open Catalyst Project\n2020 (OC20) -- a dataset closely related to our system of interest -- we\npre-trained MLP models on OC20 subsets using the {\\ae}net-PyTorch framework. We\ncompared several strategies for database subset selection. Our findings\nindicate that MLPs constructed via transfer learning exhibit better\ngeneralizability than those constructed from scratch, as demonstrated by the\nconsistency in the dynamics simulations. Remarkably, transfer learning enhances\nthe stability and accuracy of MLPs for the CuAu/H2O system with approximately\n600 reference data points. This approach achieved excellent extrapolation\nperformance in molecular dynamics (MD) simulations for the larger CuAu/6H2O\nsystem, sustaining up to 250 ps, whereas MLPs without transfer learning lasted\nless than 50 ps. We also examine the potential limitations of this strategy.\nThis work proposes an alternative, cost-effective approach for constructing\nMLPs for the challenging simulation of catalytic systems. Finally, we\nanticipate that this methodology will pave the way for broader applications in\nmaterial science and catalysis research, facilitating more efficient and\naccurate simulations across various systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"292 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A cost-effective strategy of enhancing machine learning potentials by transfer learning from a multicomponent dataset on ænet-PyTorch\",\"authors\":\"An Niza El Aisnadaa, Kajjana Boonpalit Robin van der Kruit, Koen M. Draijer, Jon Lopez-Zorrilla, Masahiro Miyauchi, Akira Yamaguchi, Nongnuch Artrith\",\"doi\":\"arxiv-2408.12939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning potentials (MLPs) offer efficient and accurate material\\nsimulations, but constructing the reference ab initio database remains a\\nsignificant challenge, particularly for catalyst-adsorbate systems. Training an\\nMLP with a small dataset can lead to overfitting, thus limiting its practical\\napplications. This study explores the feasibility of developing computationally\\ncost-effective and accurate MLPs for catalyst-adsorbate systems with a limited\\nnumber of ab initio references by leveraging a transfer learning strategy from\\nsubsets of a comprehensive public database. Using the Open Catalyst Project\\n2020 (OC20) -- a dataset closely related to our system of interest -- we\\npre-trained MLP models on OC20 subsets using the {\\\\ae}net-PyTorch framework. We\\ncompared several strategies for database subset selection. Our findings\\nindicate that MLPs constructed via transfer learning exhibit better\\ngeneralizability than those constructed from scratch, as demonstrated by the\\nconsistency in the dynamics simulations. Remarkably, transfer learning enhances\\nthe stability and accuracy of MLPs for the CuAu/H2O system with approximately\\n600 reference data points. This approach achieved excellent extrapolation\\nperformance in molecular dynamics (MD) simulations for the larger CuAu/6H2O\\nsystem, sustaining up to 250 ps, whereas MLPs without transfer learning lasted\\nless than 50 ps. We also examine the potential limitations of this strategy.\\nThis work proposes an alternative, cost-effective approach for constructing\\nMLPs for the challenging simulation of catalytic systems. Finally, we\\nanticipate that this methodology will pave the way for broader applications in\\nmaterial science and catalysis research, facilitating more efficient and\\naccurate simulations across various systems.\",\"PeriodicalId\":501066,\"journal\":{\"name\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"volume\":\"292 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.12939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.12939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A cost-effective strategy of enhancing machine learning potentials by transfer learning from a multicomponent dataset on ænet-PyTorch
Machine learning potentials (MLPs) offer efficient and accurate material
simulations, but constructing the reference ab initio database remains a
significant challenge, particularly for catalyst-adsorbate systems. Training an
MLP with a small dataset can lead to overfitting, thus limiting its practical
applications. This study explores the feasibility of developing computationally
cost-effective and accurate MLPs for catalyst-adsorbate systems with a limited
number of ab initio references by leveraging a transfer learning strategy from
subsets of a comprehensive public database. Using the Open Catalyst Project
2020 (OC20) -- a dataset closely related to our system of interest -- we
pre-trained MLP models on OC20 subsets using the {\ae}net-PyTorch framework. We
compared several strategies for database subset selection. Our findings
indicate that MLPs constructed via transfer learning exhibit better
generalizability than those constructed from scratch, as demonstrated by the
consistency in the dynamics simulations. Remarkably, transfer learning enhances
the stability and accuracy of MLPs for the CuAu/H2O system with approximately
600 reference data points. This approach achieved excellent extrapolation
performance in molecular dynamics (MD) simulations for the larger CuAu/6H2O
system, sustaining up to 250 ps, whereas MLPs without transfer learning lasted
less than 50 ps. We also examine the potential limitations of this strategy.
This work proposes an alternative, cost-effective approach for constructing
MLPs for the challenging simulation of catalytic systems. Finally, we
anticipate that this methodology will pave the way for broader applications in
material science and catalysis research, facilitating more efficient and
accurate simulations across various systems.