Subgroup Discovery Similarity Score (SDSS): A Significant Criterion for the Integration of Statistical Knowledge into Machine Learning in Materials Science

IF 10 2区 材料科学 Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY
Huiran Zhang, Mengmeng Dai, Yudian Lin, Baoyu Xu, Pin Wu, Lei Huang, Huanyu Xu, Shengzhou Li, Yan Xu, Zheng Tang, Jincang Zhang, Renchao Che, Tao Xu, Dongbo Dai
{"title":"Subgroup Discovery Similarity Score (SDSS): A Significant Criterion for the Integration of Statistical Knowledge into Machine Learning in Materials Science","authors":"Huiran Zhang, Mengmeng Dai, Yudian Lin, Baoyu Xu, Pin Wu, Lei Huang, Huanyu Xu, Shengzhou Li, Yan Xu, Zheng Tang, Jincang Zhang, Renchao Che, Tao Xu, Dongbo Dai","doi":"10.1016/j.mtphys.2025.101772","DOIUrl":null,"url":null,"abstract":"In materials science research, knowledge and machine learning (ML) have a mutually reinforcing relationship. In efforts to improve the ability of learning material datasets, researchers obtain statistical knowledge from ML models and integrate it into subsequent ML models in different ways. However, determining the most suitable method for integrating statistical knowledge into the next stage remains challenging. This limits the precise application of knowledge-driven approaches. In this work, the Subgroup Discovery Similarity Score (SDSS) is proposed as a key criterion for integrating statistical knowledge into ML models. Statistical knowledge is extracted from material datasets by subgroup discovery. In the solid solution strengthening (<span><span style=\"\"></span><span data-mathml='&lt;math xmlns=\"http://www.w3.org/1998/Math/MathML\" /&gt;' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"0.24ex\" role=\"img\" style=\"vertical-align: -0.12ex;\" viewbox=\"0 -51.7 0 103.4\" width=\"0\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"></math></span></span><script type=\"math/mml\"><math></math></script></span>) dataset, a divide-and-conquer strategy achieves a correlation coefficient of 0.96 and a MAPE of 18.44%, and reveals distinct strengthening mechanisms for the face-centered cubic (FCC) and body-centered cubic (BCC) phases. In the piezoelectric coefficients (<span><span style=\"\"></span><span data-mathml='&lt;math xmlns=\"http://www.w3.org/1998/Math/MathML\" /&gt;' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"0.24ex\" role=\"img\" style=\"vertical-align: -0.12ex;\" viewbox=\"0 -51.7 0 103.4\" width=\"0\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"></math></span></span><script type=\"math/mml\"><math></math></script></span>) dataset, statistical knowledge is encoded as features and embedded into the ML model for feature enhancement, effectively reducing the prediction error. The results suggest that our framework can extract and integrate statistical knowledge from material datasets into ML models without prior domain knowledge.","PeriodicalId":18253,"journal":{"name":"Materials Today Physics","volume":"14 1","pages":""},"PeriodicalIF":10.0000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Today Physics","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.mtphys.2025.101772","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In materials science research, knowledge and machine learning (ML) have a mutually reinforcing relationship. In efforts to improve the ability of learning material datasets, researchers obtain statistical knowledge from ML models and integrate it into subsequent ML models in different ways. However, determining the most suitable method for integrating statistical knowledge into the next stage remains challenging. This limits the precise application of knowledge-driven approaches. In this work, the Subgroup Discovery Similarity Score (SDSS) is proposed as a key criterion for integrating statistical knowledge into ML models. Statistical knowledge is extracted from material datasets by subgroup discovery. In the solid solution strengthening () dataset, a divide-and-conquer strategy achieves a correlation coefficient of 0.96 and a MAPE of 18.44%, and reveals distinct strengthening mechanisms for the face-centered cubic (FCC) and body-centered cubic (BCC) phases. In the piezoelectric coefficients () dataset, statistical knowledge is encoded as features and embedded into the ML model for feature enhancement, effectively reducing the prediction error. The results suggest that our framework can extract and integrate statistical knowledge from material datasets into ML models without prior domain knowledge.
子组发现相似度评分(SDSS):将统计知识整合到材料科学机器学习中的重要标准
在材料科学研究中,知识与机器学习是相辅相成的关系。为了提高学习材料数据集的能力,研究人员从ML模型中获取统计知识,并以不同的方式将其集成到后续的ML模型中。然而,确定将统计知识整合到下一阶段的最合适方法仍然具有挑战性。这限制了知识驱动方法的精确应用。在这项工作中,提出了子组发现相似度评分(SDSS)作为将统计知识集成到ML模型中的关键标准。统计知识是通过子组发现从材料数据集中提取出来的。在固溶体强化()数据集中,分治策略的相关系数为0.96,MAPE为18.44%,揭示了面心立方(FCC)和体心立方(BCC)相的不同强化机制。在压电系数()数据集中,统计知识被编码为特征,嵌入到ML模型中进行特征增强,有效降低了预测误差。结果表明,我们的框架可以在没有先验领域知识的情况下从材料数据集中提取和集成统计知识到ML模型中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Materials Today Physics
Materials Today Physics Materials Science-General Materials Science
CiteScore
14.00
自引率
7.80%
发文量
284
审稿时长
15 days
期刊介绍: Materials Today Physics is a multi-disciplinary journal focused on the physics of materials, encompassing both the physical properties and materials synthesis. Operating at the interface of physics and materials science, this journal covers one of the largest and most dynamic fields within physical science. The forefront research in materials physics is driving advancements in new materials, uncovering new physics, and fostering novel applications at an unprecedented pace.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信