Investigating the error imbalance of large-scale machine learning potentials in catalysis†

IF 5.4 3区材料科学 Q2 CHEMISTRY, PHYSICAL

ACS Applied Energy Materials Pub Date : 2024-08-26 DOI:10.1039/d4cy00615a

Kareem Abdelmaqsoud , Muhammed Shuaibi , Adeesh Kolluru , Raffaele Cheula , John R. Kitchin

{"title":"Investigating the error imbalance of large-scale machine learning potentials in catalysis†","authors":"Kareem Abdelmaqsoud , Muhammed Shuaibi , Adeesh Kolluru , Raffaele Cheula , John R. Kitchin","doi":"10.1039/d4cy00615a","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning potentials (MLPs) have greatly accelerated atomistic simulations for material discovery. The Open Catalyst 2020 (OC20) dataset is one of the largest datasets for training MLPs for heterogeneous catalysis. The mean absolute errors (MAE) of the MLPs on the energy target of the dataset have asymptotically approached about 0.2 eV over the past two years with increasingly sophisticated models. The errors were found to be imbalanced between the different material classes with non-metals having the highest errors. In this work, we investigate several potential sources for the imbalanced distribution of errors. We examined material class-specific convergence errors in the density functional theory (DFT) calculations including <em>k</em>-point sampling, plane wave cutoff and smearing width. Significant DFT convergence errors with a mean absolute value of ∼0.15 eV were found on the total energies of non-metals, higher than the other material classes. However, as a result of cancellation of errors, convergence errors on adsorption energies have a mean absolute value of ∼0.05 eV across all material classes. Moreover, we found that the MAEs of the MLPs are not affected by these convergence errors. Second, we show that calculations with surface reconstruction can introduce inconsistencies to the adsorption energy referencing scheme that cannot be fit by the MLPs. Nonmetals and halides were found to have the highest fraction of calculations with surface reconstructions. Removing calculations with surface reconstructions from the validation sets, without re-training, significantly lowers the MAEs by ∼35% and reduces the imbalance of the MAEs. Alternatively, MLPs trained on total energies provide a solution to the surface reconstruction inconsistencies since they eliminate the referencing issue, and have comparable MAEs to MLPs trained on adsorption energies.</div></div>","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/cy/d4cy00615a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/org/science/article/pii/S2044475324005069","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning potentials (MLPs) have greatly accelerated atomistic simulations for material discovery. The Open Catalyst 2020 (OC20) dataset is one of the largest datasets for training MLPs for heterogeneous catalysis. The mean absolute errors (MAE) of the MLPs on the energy target of the dataset have asymptotically approached about 0.2 eV over the past two years with increasingly sophisticated models. The errors were found to be imbalanced between the different material classes with non-metals having the highest errors. In this work, we investigate several potential sources for the imbalanced distribution of errors. We examined material class-specific convergence errors in the density functional theory (DFT) calculations including k-point sampling, plane wave cutoff and smearing width. Significant DFT convergence errors with a mean absolute value of ∼0.15 eV were found on the total energies of non-metals, higher than the other material classes. However, as a result of cancellation of errors, convergence errors on adsorption energies have a mean absolute value of ∼0.05 eV across all material classes. Moreover, we found that the MAEs of the MLPs are not affected by these convergence errors. Second, we show that calculations with surface reconstruction can introduce inconsistencies to the adsorption energy referencing scheme that cannot be fit by the MLPs. Nonmetals and halides were found to have the highest fraction of calculations with surface reconstructions. Removing calculations with surface reconstructions from the validation sets, without re-training, significantly lowers the MAEs by ∼35% and reduces the imbalance of the MAEs. Alternatively, MLPs trained on total energies provide a solution to the surface reconstruction inconsistencies since they eliminate the referencing issue, and have comparable MAEs to MLPs trained on adsorption energies.

Abstract Image

查看原文本刊更多论文

调查催化领域大规模机器学习潜力的误差不平衡性

机器学习势能（MLP）大大加快了用于材料发现的原子模拟。开放催化剂 2020（OC20）数据集是用于训练异质催化 MLP 的最大数据集之一。在过去两年中，随着模型越来越复杂，该数据集能量目标上的 MLP 平均绝对误差（MAE）逐渐接近 0.2 eV。研究发现，不同材料类别之间的误差并不平衡，非金属材料的误差最大。在这项工作中，我们研究了误差分布不平衡的几个潜在原因。我们研究了密度泛函理论（DFT）计算中特定材料类别的收敛误差，包括 k 点采样、平面波截止和涂抹宽度。结果发现，非金属的总能量存在显著的 DFT 收敛误差，平均绝对值为 0.15 eV，高于其他材料类别。然而，由于误差的抵消，所有材料类别的吸附能收敛误差的平均绝对值为 ∼ 0.05 eV。此外，我们发现 MLP 的 MAEs 不受这些收敛误差的影响。其次，我们发现表面重构计算会给吸附能参考方案带来不一致，而这些不一致是 MLP 无法拟合的。我们发现非金属和卤化物的表面重构计算比例最高。在不重新训练的情况下，从验证集中移除带有表面重构的计算，可显著降低 MAEs ∼ 35%，并减少 MAEs 的不平衡性。另外，根据总能量训练的 MLP 也能解决表面重构不一致的问题，因为它们消除了参照问题，其 MAE 与根据吸附能量训练的 MLP 相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Energy Materials Materials Science-Materials Chemistry

CiteScore

10.30

自引率

6.20%

发文量

1368

期刊介绍： ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.