Subgroup Discovery Similarity Score (SDSS): A Significant Criterion for the Integration of Statistical Knowledge into Machine Learning in Materials Science
IF 10 2区 材料科学Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY
Huiran Zhang, Mengmeng Dai, Yudian Lin, Baoyu Xu, Pin Wu, Lei Huang, Huanyu Xu, Shengzhou Li, Yan Xu, Zheng Tang, Jincang Zhang, Renchao Che, Tao Xu, Dongbo Dai
{"title":"Subgroup Discovery Similarity Score (SDSS): A Significant Criterion for the Integration of Statistical Knowledge into Machine Learning in Materials Science","authors":"Huiran Zhang, Mengmeng Dai, Yudian Lin, Baoyu Xu, Pin Wu, Lei Huang, Huanyu Xu, Shengzhou Li, Yan Xu, Zheng Tang, Jincang Zhang, Renchao Che, Tao Xu, Dongbo Dai","doi":"10.1016/j.mtphys.2025.101772","DOIUrl":null,"url":null,"abstract":"In materials science research, knowledge and machine learning (ML) have a mutually reinforcing relationship. In efforts to improve the ability of learning material datasets, researchers obtain statistical knowledge from ML models and integrate it into subsequent ML models in different ways. However, determining the most suitable method for integrating statistical knowledge into the next stage remains challenging. This limits the precise application of knowledge-driven approaches. In this work, the Subgroup Discovery Similarity Score (SDSS) is proposed as a key criterion for integrating statistical knowledge into ML models. Statistical knowledge is extracted from material datasets by subgroup discovery. In the solid solution strengthening (<span><span style=\"\"></span><span data-mathml='<math xmlns=\"http://www.w3.org/1998/Math/MathML\" />' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"0.24ex\" role=\"img\" style=\"vertical-align: -0.12ex;\" viewbox=\"0 -51.7 0 103.4\" width=\"0\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"></math></span></span><script type=\"math/mml\"><math></math></script></span>) dataset, a divide-and-conquer strategy achieves a correlation coefficient of 0.96 and a MAPE of 18.44%, and reveals distinct strengthening mechanisms for the face-centered cubic (FCC) and body-centered cubic (BCC) phases. In the piezoelectric coefficients (<span><span style=\"\"></span><span data-mathml='<math xmlns=\"http://www.w3.org/1998/Math/MathML\" />' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"0.24ex\" role=\"img\" style=\"vertical-align: -0.12ex;\" viewbox=\"0 -51.7 0 103.4\" width=\"0\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"></math></span></span><script type=\"math/mml\"><math></math></script></span>) dataset, statistical knowledge is encoded as features and embedded into the ML model for feature enhancement, effectively reducing the prediction error. The results suggest that our framework can extract and integrate statistical knowledge from material datasets into ML models without prior domain knowledge.","PeriodicalId":18253,"journal":{"name":"Materials Today Physics","volume":"14 1","pages":""},"PeriodicalIF":10.0000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Today Physics","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.mtphys.2025.101772","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In materials science research, knowledge and machine learning (ML) have a mutually reinforcing relationship. In efforts to improve the ability of learning material datasets, researchers obtain statistical knowledge from ML models and integrate it into subsequent ML models in different ways. However, determining the most suitable method for integrating statistical knowledge into the next stage remains challenging. This limits the precise application of knowledge-driven approaches. In this work, the Subgroup Discovery Similarity Score (SDSS) is proposed as a key criterion for integrating statistical knowledge into ML models. Statistical knowledge is extracted from material datasets by subgroup discovery. In the solid solution strengthening () dataset, a divide-and-conquer strategy achieves a correlation coefficient of 0.96 and a MAPE of 18.44%, and reveals distinct strengthening mechanisms for the face-centered cubic (FCC) and body-centered cubic (BCC) phases. In the piezoelectric coefficients () dataset, statistical knowledge is encoded as features and embedded into the ML model for feature enhancement, effectively reducing the prediction error. The results suggest that our framework can extract and integrate statistical knowledge from material datasets into ML models without prior domain knowledge.
期刊介绍:
Materials Today Physics is a multi-disciplinary journal focused on the physics of materials, encompassing both the physical properties and materials synthesis. Operating at the interface of physics and materials science, this journal covers one of the largest and most dynamic fields within physical science. The forefront research in materials physics is driving advancements in new materials, uncovering new physics, and fostering novel applications at an unprecedented pace.