S.B. Akinpelu , S.A. Abolade , E. Okafor , D.O. Obada , A.M. Ukpong , S. Kumar R. , J. Healy , A. Akande
{"title":"用可解释的机器学习方法预测 ABX3 包晶的机械特性","authors":"S.B. Akinpelu , S.A. Abolade , E. Okafor , D.O. Obada , A.M. Ukpong , S. Kumar R. , J. Healy , A. Akande","doi":"10.1016/j.rinp.2024.107978","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX<sub>3</sub> perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R<sup>2</sup>. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R<sup>2</sup> value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.</div></div>","PeriodicalId":21042,"journal":{"name":"Results in Physics","volume":"65 ","pages":"Article 107978"},"PeriodicalIF":4.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites\",\"authors\":\"S.B. Akinpelu , S.A. Abolade , E. Okafor , D.O. Obada , A.M. Ukpong , S. Kumar R. , J. Healy , A. Akande\",\"doi\":\"10.1016/j.rinp.2024.107978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX<sub>3</sub> perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R<sup>2</sup>. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R<sup>2</sup> value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.</div></div>\",\"PeriodicalId\":21042,\"journal\":{\"name\":\"Results in Physics\",\"volume\":\"65 \",\"pages\":\"Article 107978\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Results in Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211379724006636\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Physics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211379724006636","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
Interpretable machine learning methods to predict the mechanical properties of ABX3 perovskites
This paper proposes the utility of interpretable ensemble learning models for predicting the mechanical properties (bulk, shear and Young moduli) of ABX3 perovskite compounds with the A, B, and X referring to the 3 elements that make the cubic 3-dimensional framework of the perovskite compounds. These models consist of 3 ensemble learning techniques namely CatBoost, Random Forest, and XGBoost. To expand the feature space, robust first-principles density functional theory calculations were used to generate some of the input features, namely elastic constants, density, volume per atom, and ground state energy per atom. The order of the input feature ranking that influences the machine learning (ML) model decisions was then determined. For this, we performed correlation analysis on the multi-dimensional input feature space, suppressed features with high collinearity, and selected features with limited correlation. We trained the three ensemble learning techniques on the desired vectorial input feature representation to predict the mechanical properties. Furthermore, we employed the Shapley Additive Explanations (SHAP) algorithm for analysing the intrinsic decision-making rationality of the ensemble learning models. We measured the performance in the context of the error metrics and coefficient of determination, R2. The results show that XGBoost outperforms other approaches when predicting the shear modulus or Young modulus of the perovskite compounds yielding the least error metrics and the highest R2 value (0.97) in the testing phase. However, both CatBoost and Random Forest outperformed XGBoost when attempting to predict the bulk modulus in the testing phase. The deficiency of the XGBoost in predicting the bulk modulus can be ascribed to an overfitting problem which can occur when the ML model gives accurate predictions for training data but not for test data. Furthermore, the SHAP algorithm provides an insight into the order of feature importance (from highest to lowest). Additionally, we conducted a post-analysis using a holistic ranking to analyse the relative importance of the SHAP feature impact comprehension for the examined ensemble learning techniques. Our findings indicate that the elastic constants are the most important input features influencing the predictive decision of the ensemble learning models.
Results in PhysicsMATERIALS SCIENCE, MULTIDISCIPLINARYPHYSIC-PHYSICS, MULTIDISCIPLINARY
CiteScore
8.70
自引率
9.40%
发文量
754
审稿时长
50 days
期刊介绍:
Results in Physics is an open access journal offering authors the opportunity to publish in all fundamental and interdisciplinary areas of physics, materials science, and applied physics. Papers of a theoretical, computational, and experimental nature are all welcome. Results in Physics accepts papers that are scientifically sound, technically correct and provide valuable new knowledge to the physics community. Topics such as three-dimensional flow and magnetohydrodynamics are not within the scope of Results in Physics.
Results in Physics welcomes three types of papers:
1. Full research papers
2. Microarticles: very short papers, no longer than two pages. They may consist of a single, but well-described piece of information, such as:
- Data and/or a plot plus a description
- Description of a new method or instrumentation
- Negative results
- Concept or design study
3. Letters to the Editor: Letters discussing a recent article published in Results in Physics are welcome. These are objective, constructive, or educational critiques of papers published in Results in Physics. Accepted letters will be sent to the author of the original paper for a response. Each letter and response is published together. Letters should be received within 8 weeks of the article''s publication. They should not exceed 750 words of text and 10 references.