Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning

IF 9.4 1区材料科学 Q1 CHEMISTRY, PHYSICAL

npj Computational Materials Pub Date : 2024-07-17 DOI:10.1038/s41524-024-01339-x

Alice E. A. Allen, Nicholas Lubbers, Sakib Matin, Justin Smith, Richard Messerly, Sergei Tretiak, Kipton Barros

{"title":"Learning together: Towards foundation models for machine learning interatomic potentials with meta-learning","authors":"Alice E. A. Allen, Nicholas Lubbers, Sakib Matin, Justin Smith, Richard Messerly, Sergei Tretiak, Kipton Barros","doi":"10.1038/s41524-024-01339-x","DOIUrl":null,"url":null,"abstract":"<p>The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method. Taking machine learning interatomic potentials (MLIPs) as an example, we show that meta-learning techniques, a recent advancement from the machine learning community, can be used to fit multiple levels of QM theory in the same training process. Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data. We then demonstrate that meta-learning enables simultaneously training to multiple large organic molecule datasets. As a proof of concept, we examine the performance of a MLIP refit to a small drug-like molecule and show that pre-training potentials to multiple levels of theory with meta-learning improves performance. This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced. We therefore show that meta-learning can utilize existing datasets with inconsistent QM levels of theory to produce models that are better at specializing to new datasets. This opens new routes for creating pre-trained, foundation models for interatomic potentials.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"33 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-024-01339-x","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The development of machine learning models has led to an abundance of datasets containing quantum mechanical (QM) calculations for molecular and material systems. However, traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method. Taking machine learning interatomic potentials (MLIPs) as an example, we show that meta-learning techniques, a recent advancement from the machine learning community, can be used to fit multiple levels of QM theory in the same training process. Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data. We then demonstrate that meta-learning enables simultaneously training to multiple large organic molecule datasets. As a proof of concept, we examine the performance of a MLIP refit to a small drug-like molecule and show that pre-training potentials to multiple levels of theory with meta-learning improves performance. This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced. We therefore show that meta-learning can utilize existing datasets with inconsistent QM levels of theory to produce models that are better at specializing to new datasets. This opens new routes for creating pre-trained, foundation models for interatomic potentials.

Abstract Image

查看原文本刊更多论文

共同学习：利用元学习为机器学习原子间电位建立基础模型

机器学习模型的发展带来了大量包含分子和材料系统量子力学（QM）计算的数据集。然而，机器学习模型的传统训练方法无法利用大量可用数据，因为它们要求每个数据集都使用相同的量子力学方法生成。以机器学习原子间势（MLIPs）为例，我们展示了元学习技术（机器学习界的最新进展）可用于在同一训练过程中拟合多层次的 QM 理论。元学习改变了训练过程，以学习一种表征，这种表征只需少量数据就能很容易地对新任务进行再训练。然后，我们证明元学习可以同时对多个大型有机分子数据集进行训练。作为概念验证，我们检验了针对小分子类药物的 MLIP refit 的性能，结果表明通过元学习对多层次理论进行预训练可以提高性能。这种性能差异既体现在误差的减少上，也体现在生成的势能面的平滑度的提高上。因此，我们证明元学习可以利用不一致的 QM 理论水平的现有数据集，生成能更好地适应新数据集的模型。这为创建原子间势能的预训练基础模型开辟了新的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

npj Computational Materials Mathematics-Modeling and Simulation

CiteScore

15.30

自引率

5.20%

发文量

229

审稿时长

6 weeks

期刊介绍： npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings. Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.