Omar Allam, Brook Wander, SungYeon Kim, Rudi Plesch, Tyler Sours, Jia-Min Chu, Thomas Ludwig, Jiyoon Kim, Rodrigo Wang, Shivang Agarwal, Alan Rask, Alexandre Fleury, Chuhong Wang, Andrew Wildman, Thomas Mustard, Kevin Ryczko, Paul Abruzzo, AJ Nish, Aayush R. Singh
{"title":"AQCat25: unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis","authors":"Omar Allam, Brook Wander, SungYeon Kim, Rudi Plesch, Tyler Sours, Jia-Min Chu, Thomas Ludwig, Jiyoon Kim, Rodrigo Wang, Shivang Agarwal, Alan Rask, Alexandre Fleury, Chuhong Wang, Andrew Wildman, Thomas Mustard, Kevin Ryczko, Paul Abruzzo, AJ Nish, Aayush R. Singh","doi":"10.1038/s41524-026-02099-6","DOIUrl":null,"url":null,"abstract":"Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a dataset of 13.5 million density functional theory (DFT) single-point calculations designed to enhance the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate integrating datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset’s knowledge. Conversely, joint training strategies prove effective for improving accuracy on new distributions without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example, by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances accuracy. Ultimately, we establish an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"152 1","pages":""},"PeriodicalIF":11.9000,"publicationDate":"2026-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-026-02099-6","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a dataset of 13.5 million density functional theory (DFT) single-point calculations designed to enhance the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate integrating datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset’s knowledge. Conversely, joint training strategies prove effective for improving accuracy on new distributions without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example, by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances accuracy. Ultimately, we establish an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.
期刊介绍:
npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings.
Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.