AQCat25: unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis

IF 11.9 1区材料科学 Q1 CHEMISTRY, PHYSICAL

npj Computational Materials Pub Date : 2026-04-27 DOI:10.1038/s41524-026-02099-6

Omar Allam, Brook Wander, SungYeon Kim, Rudi Plesch, Tyler Sours, Jia-Min Chu, Thomas Ludwig, Jiyoon Kim, Rodrigo Wang, Shivang Agarwal, Alan Rask, Alexandre Fleury, Chuhong Wang, Andrew Wildman, Thomas Mustard, Kevin Ryczko, Paul Abruzzo, AJ Nish, Aayush R. Singh

{"title":"AQCat25: unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis","authors":"Omar Allam, Brook Wander, SungYeon Kim, Rudi Plesch, Tyler Sours, Jia-Min Chu, Thomas Ludwig, Jiyoon Kim, Rodrigo Wang, Shivang Agarwal, Alan Rask, Alexandre Fleury, Chuhong Wang, Andrew Wildman, Thomas Mustard, Kevin Ryczko, Paul Abruzzo, AJ Nish, Aayush R. Singh","doi":"10.1038/s41524-026-02099-6","DOIUrl":null,"url":null,"abstract":"Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a dataset of 13.5 million density functional theory (DFT) single-point calculations designed to enhance the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate integrating datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset’s knowledge. Conversely, joint training strategies prove effective for improving accuracy on new distributions without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example, by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances accuracy. Ultimately, we establish an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"152 1","pages":""},"PeriodicalIF":11.9000,"publicationDate":"2026-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-026-02099-6","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a dataset of 13.5 million density functional theory (DFT) single-point calculations designed to enhance the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate integrating datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset’s knowledge. Conversely, joint training strategies prove effective for improving accuracy on new distributions without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example, by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances accuracy. Ultimately, we establish an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.

查看原文本刊更多论文

AQCat25：解锁多相催化的自旋感知、高保真机器学习潜力

大规模数据集为通用异构催化建模提供了高精度的机器学习原子间势（MLIPs）。然而，由于基础训练数据的差距，这些潜力的治疗存在一些局限性。为了扩展这些功能，我们引入了AQCat25，这是一个1350万个密度泛函理论（DFT）单点计算的数据集，旨在增强对自旋极化和/或更高保真度至关重要的系统的处理。我们还研究了将AQCat25等数据集与更广泛的Open Catalyst 2020 （OC20）数据集集成在一起，在不牺牲泛化性的情况下创建自旋感知模型。我们发现直接在AQCat25上调优通用模型会导致原始数据集知识的灾难性遗忘。相反，联合训练策略被证明在不牺牲一般性能的情况下有效地提高了新分布的准确性。这种联合方法带来了一个挑战，因为模型必须从包含混合保真度计算和混合物理（自旋极化与非极化）的数据集中学习。我们展示了在系统特定的元数据上显式调节模型，例如，通过使用特征线性调制（FiLM），成功地解决了这一挑战，并进一步提高了准确性。最后，我们建立了一个有效的桥接DFT保真域的协议，以提高催化基础模型的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

npj Computational Materials Mathematics-Modeling and Simulation

CiteScore

15.30

自引率

5.20%

发文量

229

审稿时长

6 weeks

期刊介绍： npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings. Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.