{"title":"晶体材料机器学习中材料表征与特征工程的集成:从局部到全局化学结构信息耦合","authors":"Bin Xiao, Yuchao Tang, Yi Liu","doi":"10.1002/wcms.70044","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Integrating materials representations into feature engineering by rational design plays a critical role in determining the capability and accuracy of material property prediction via machine learning (ML). There still exists a lack of comprehensive classification and multi-dimensional evaluation for many existing feature models that could guide model selection in applications and further development. This review systematically classifies feature construction methods for crystalline structures, emphasizing the coupling between chemical and structural information. We systematically discuss the geometric configurations, chemical attributes, and their intricate coupling mechanisms that can be leveraged for feature engineering. Furthermore, a comprehensive comparison is performed across multiple aspects including graph network representation, structural information embedding, chemistry-structure information coupling, local versus global characteristics, long-range versus short-range description, algorithm compatibility with kernel function method or deep neural network, data size requirements, computational complexity, and interpretability mechanisms, thereby highlighting key variations in existing feature models and improving the physical interpretability of predictive models. To illustrate the integration of multi-dimensional characteristics, the center-environment (CE) feature model is introduced based on the coupling between local chemical and structural information of physical core-shell structures. Within the CE model, the pre-attention mechanism reorients focus from intricate details within complex ML algorithms to explicit feature models that depict physical core-shell configurations. By minimizing data requirements while enhancing transparency in ML models, the CE feature provides a practical approach for developing efficient and accurate ML-based predictions tailored for small-data scenarios in materials science.</p>\n <p>This article is categorized under:\n\n </p><ul>\n \n <li>Structure and Mechanism > Computational Materials Science</li>\n \n <li>Data Science > Artificial Intelligence/Machine Learning</li>\n </ul>\n </div>","PeriodicalId":236,"journal":{"name":"Wiley Interdisciplinary Reviews: Computational Molecular Science","volume":"15 4","pages":""},"PeriodicalIF":27.0000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry-Structure Information Coupling\",\"authors\":\"Bin Xiao, Yuchao Tang, Yi Liu\",\"doi\":\"10.1002/wcms.70044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Integrating materials representations into feature engineering by rational design plays a critical role in determining the capability and accuracy of material property prediction via machine learning (ML). There still exists a lack of comprehensive classification and multi-dimensional evaluation for many existing feature models that could guide model selection in applications and further development. This review systematically classifies feature construction methods for crystalline structures, emphasizing the coupling between chemical and structural information. We systematically discuss the geometric configurations, chemical attributes, and their intricate coupling mechanisms that can be leveraged for feature engineering. Furthermore, a comprehensive comparison is performed across multiple aspects including graph network representation, structural information embedding, chemistry-structure information coupling, local versus global characteristics, long-range versus short-range description, algorithm compatibility with kernel function method or deep neural network, data size requirements, computational complexity, and interpretability mechanisms, thereby highlighting key variations in existing feature models and improving the physical interpretability of predictive models. To illustrate the integration of multi-dimensional characteristics, the center-environment (CE) feature model is introduced based on the coupling between local chemical and structural information of physical core-shell structures. Within the CE model, the pre-attention mechanism reorients focus from intricate details within complex ML algorithms to explicit feature models that depict physical core-shell configurations. By minimizing data requirements while enhancing transparency in ML models, the CE feature provides a practical approach for developing efficient and accurate ML-based predictions tailored for small-data scenarios in materials science.</p>\\n <p>This article is categorized under:\\n\\n </p><ul>\\n \\n <li>Structure and Mechanism > Computational Materials Science</li>\\n \\n <li>Data Science > Artificial Intelligence/Machine Learning</li>\\n </ul>\\n </div>\",\"PeriodicalId\":236,\"journal\":{\"name\":\"Wiley Interdisciplinary Reviews: Computational Molecular Science\",\"volume\":\"15 4\",\"pages\":\"\"},\"PeriodicalIF\":27.0000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wiley Interdisciplinary Reviews: Computational Molecular Science\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.70044\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews: Computational Molecular Science","FirstCategoryId":"92","ListUrlMain":"https://wires.onlinelibrary.wiley.com/doi/10.1002/wcms.70044","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Integrating Materials Representations Into Feature Engineering in Machine Learning for Crystalline Materials: From Local to Global Chemistry-Structure Information Coupling
Integrating materials representations into feature engineering by rational design plays a critical role in determining the capability and accuracy of material property prediction via machine learning (ML). There still exists a lack of comprehensive classification and multi-dimensional evaluation for many existing feature models that could guide model selection in applications and further development. This review systematically classifies feature construction methods for crystalline structures, emphasizing the coupling between chemical and structural information. We systematically discuss the geometric configurations, chemical attributes, and their intricate coupling mechanisms that can be leveraged for feature engineering. Furthermore, a comprehensive comparison is performed across multiple aspects including graph network representation, structural information embedding, chemistry-structure information coupling, local versus global characteristics, long-range versus short-range description, algorithm compatibility with kernel function method or deep neural network, data size requirements, computational complexity, and interpretability mechanisms, thereby highlighting key variations in existing feature models and improving the physical interpretability of predictive models. To illustrate the integration of multi-dimensional characteristics, the center-environment (CE) feature model is introduced based on the coupling between local chemical and structural information of physical core-shell structures. Within the CE model, the pre-attention mechanism reorients focus from intricate details within complex ML algorithms to explicit feature models that depict physical core-shell configurations. By minimizing data requirements while enhancing transparency in ML models, the CE feature provides a practical approach for developing efficient and accurate ML-based predictions tailored for small-data scenarios in materials science.
This article is categorized under:
Structure and Mechanism > Computational Materials Science
Data Science > Artificial Intelligence/Machine Learning
期刊介绍:
Computational molecular sciences harness the power of rigorous chemical and physical theories, employing computer-based modeling, specialized hardware, software development, algorithm design, and database management to explore and illuminate every facet of molecular sciences. These interdisciplinary approaches form a bridge between chemistry, biology, and materials sciences, establishing connections with adjacent application-driven fields in both chemistry and biology. WIREs Computational Molecular Science stands as a platform to comprehensively review and spotlight research from these dynamic and interconnected fields.