基于词嵌入的复合型复杂固溶体的组合属性外推

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Lei Zhang, Lars Banko, Wolfgang Schuhmann, Alfred Ludwig and Markus Stricker
{"title":"基于词嵌入的复合型复杂固溶体的组合属性外推","authors":"Lei Zhang, Lars Banko, Wolfgang Schuhmann, Alfred Ludwig and Markus Stricker","doi":"10.1039/D5DD00169B","DOIUrl":null,"url":null,"abstract":"<p >Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag–Pd–Ru; Ag–Pd–Pt), to predict electrocatalytic performance in the shared quaternary system (Ag–Pd–Pt–Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (<em>r</em>) of 0.63 and a determination coefficient (<em>r</em><small><sup>2</sup></small>) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved <em>r</em><small><sup>2</sup></small> of 0.65. The third model is based on a ‘standard vector method’ which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting <em>r</em> of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1578-1590"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00169b?page=search","citationCount":"0","resultStr":"{\"title\":\"Composition-property extrapolation for compositionally complex solid solutions based on word embeddings†\",\"authors\":\"Lei Zhang, Lars Banko, Wolfgang Schuhmann, Alfred Ludwig and Markus Stricker\",\"doi\":\"10.1039/D5DD00169B\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag–Pd–Ru; Ag–Pd–Pt), to predict electrocatalytic performance in the shared quaternary system (Ag–Pd–Pt–Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (<em>r</em>) of 0.63 and a determination coefficient (<em>r</em><small><sup>2</sup></small>) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved <em>r</em><small><sup>2</sup></small> of 0.65. The third model is based on a ‘standard vector method’ which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting <em>r</em> of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.</p>\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 6\",\"pages\":\" 1578-1590\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00169b?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00169b\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00169b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

掌握用多主元素(高熵合金/组成复杂的固溶体)预测未知材料性质的挑战对于加速材料发现至关重要。我们展示和讨论了三个模型,使用实验测量的电催化性能数据从两个三元体系(Ag-Pd-Ru;Ag-Pd-Pt),以预测共分四元体系(Ag-Pd-Pt - ru)中的电催化性能。作为起点,我们应用基于成分的高斯过程回归(GPR)作为特征,其中包括Ag和Pd,获得预测的初始相关系数(r)为0.63,决定系数(r2)为0.08。其次,我们提出了一个使用词嵌入衍生材料向量作为特征的GPR模型版本。使用特定材料的嵌入向量显著提高了预测,改进的r2为0.65。第三种模型基于“标准向量方法”,该方法将材料属性的加权向量表示合成为特征,然后创建一个参考向量,该参考向量与第四元系统的材料性能具有非常好的相关性(结果r为0.94)。我们的方法表明,现有的实验数据结合词嵌入衍生的材料表示的潜在知识,可以有效地用于数据通常稀缺的材料发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Composition-property extrapolation for compositionally complex solid solutions based on word embeddings†

Composition-property extrapolation for compositionally complex solid solutions based on word embeddings†

Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag–Pd–Ru; Ag–Pd–Pt), to predict electrocatalytic performance in the shared quaternary system (Ag–Pd–Pt–Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (r) of 0.63 and a determination coefficient (r2) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved r2 of 0.65. The third model is based on a ‘standard vector method’ which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting r of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信