{"title":"用于机器学习原子间势的材料不可知数据集的信息熵驱动生成","authors":"Aparna P. A. Subramanyam, Danny Perez","doi":"10.1038/s41524-025-01602-9","DOIUrl":null,"url":null,"abstract":"<p>In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the properties of novel, out-of-sample configurations, making the quality of the training set a determining factor, especially when investigating materials under extreme conditions. We propose a novel automated dataset generation method based on the maximization of the information entropy of the feature distribution, aiming at an extremely broad coverage of the configuration space in a way that is agnostic to the properties of specific target materials. The ability of the dataset to capture unique material properties is demonstrated on a range of unary materials, including elements with the FCC (Al), BCC (W), HCP (Be, Re and Os), graphite (C), and trigonal (Sb, Te) ground states. MLIAPs trained to this dataset are shown to be accurate over a range of application-relevant metrics, as well as extremely robust over very broad swaths of configurations space, even without dataset fine-tuning or hyper-parameter optimization, making the approach extremely attractive to rapidly and autonomously develop general-purpose MLIAPs suitable for simulations in extreme conditions.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"37 1","pages":""},"PeriodicalIF":11.9000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information-entropy-driven generation of material-agnostic datasets for machine-learning interatomic potentials\",\"authors\":\"Aparna P. A. Subramanyam, Danny Perez\",\"doi\":\"10.1038/s41524-025-01602-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the properties of novel, out-of-sample configurations, making the quality of the training set a determining factor, especially when investigating materials under extreme conditions. We propose a novel automated dataset generation method based on the maximization of the information entropy of the feature distribution, aiming at an extremely broad coverage of the configuration space in a way that is agnostic to the properties of specific target materials. The ability of the dataset to capture unique material properties is demonstrated on a range of unary materials, including elements with the FCC (Al), BCC (W), HCP (Be, Re and Os), graphite (C), and trigonal (Sb, Te) ground states. MLIAPs trained to this dataset are shown to be accurate over a range of application-relevant metrics, as well as extremely robust over very broad swaths of configurations space, even without dataset fine-tuning or hyper-parameter optimization, making the approach extremely attractive to rapidly and autonomously develop general-purpose MLIAPs suitable for simulations in extreme conditions.</p>\",\"PeriodicalId\":19342,\"journal\":{\"name\":\"npj Computational Materials\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":11.9000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"npj Computational Materials\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1038/s41524-025-01602-9\",\"RegionNum\":1,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-025-01602-9","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
Information-entropy-driven generation of material-agnostic datasets for machine-learning interatomic potentials
In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the properties of novel, out-of-sample configurations, making the quality of the training set a determining factor, especially when investigating materials under extreme conditions. We propose a novel automated dataset generation method based on the maximization of the information entropy of the feature distribution, aiming at an extremely broad coverage of the configuration space in a way that is agnostic to the properties of specific target materials. The ability of the dataset to capture unique material properties is demonstrated on a range of unary materials, including elements with the FCC (Al), BCC (W), HCP (Be, Re and Os), graphite (C), and trigonal (Sb, Te) ground states. MLIAPs trained to this dataset are shown to be accurate over a range of application-relevant metrics, as well as extremely robust over very broad swaths of configurations space, even without dataset fine-tuning or hyper-parameter optimization, making the approach extremely attractive to rapidly and autonomously develop general-purpose MLIAPs suitable for simulations in extreme conditions.
期刊介绍:
npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings.
Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.