Wen Yao , Wanli Jia , Ruofan Shen , Jiayao Wang , Lin Zhang , Xinmei Wang
{"title":"Machine learning prediction of bandgap and formation energy in two-dimensional metal oxides","authors":"Wen Yao , Wanli Jia , Ruofan Shen , Jiayao Wang , Lin Zhang , Xinmei Wang","doi":"10.1016/j.physb.2025.417821","DOIUrl":null,"url":null,"abstract":"<div><div>Two-dimensional (2D) transition metal oxides (TMOs) including perovskite oxides with tunable band gaps offer promising opportunities in optoelectronics, energy storage, catalysis, and sensing applications. In this work, we propose a machine learning (ML)-based framework for the accurate prediction and analysis of the band gap and formation energy of 2D TMOs. A comprehensive feature engineering strategy was employed to construct 120 physical descriptors, followed by feature selection using Pearson correlation coefficients and feature importance rankings. We evaluated seven machine learning algorithms across six prediction tasks encompassing various material types, scales, and target properties. Among them, eXtreme Gradient Boosting (XGBoost) and Gradient Boosting Decision Tree (GBDT)—implemented via Gradient Boosting Classifier for classification tasks and Gradient Boosting Regressor for regression tasks—consistently exhibited superior performance. In the classification of electronic band types, XGBoost achieved an accuracy of 95.4 %, while the Gradient Boosting Classifier reached 92.3 %. For the regression prediction of band gaps and formation energies, both XGBoost and Gradient Boosting Regressor attained coefficients of determination (R<sup>2</sup>) close to 0.90. Furthermore, SHapley Additive exPlanations (SHAP) analysis provided interpretability by identifying dominant features influencing each property. The bandgap was primarily governed by the average number of d-orbital valence electrons, the proportion of s-orbital valence electrons, oxygen content (variable only in 2D oxides), and average atomic mass. In contrast, formation energy exhibited strong correlations with the electronegativity range, oxygen content in 2D oxides, and average d-orbital valence electron count. This study offers a robust and interpretable predictive approach for accelerating the screening and rational design of 2D TMOs, potentially reducing computational costs in high-throughput materials discovery workflows.</div></div>","PeriodicalId":20116,"journal":{"name":"Physica B-condensed Matter","volume":"717 ","pages":"Article 417821"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica B-condensed Matter","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092145262500938X","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, CONDENSED MATTER","Score":null,"Total":0}
引用次数: 0
Abstract
Two-dimensional (2D) transition metal oxides (TMOs) including perovskite oxides with tunable band gaps offer promising opportunities in optoelectronics, energy storage, catalysis, and sensing applications. In this work, we propose a machine learning (ML)-based framework for the accurate prediction and analysis of the band gap and formation energy of 2D TMOs. A comprehensive feature engineering strategy was employed to construct 120 physical descriptors, followed by feature selection using Pearson correlation coefficients and feature importance rankings. We evaluated seven machine learning algorithms across six prediction tasks encompassing various material types, scales, and target properties. Among them, eXtreme Gradient Boosting (XGBoost) and Gradient Boosting Decision Tree (GBDT)—implemented via Gradient Boosting Classifier for classification tasks and Gradient Boosting Regressor for regression tasks—consistently exhibited superior performance. In the classification of electronic band types, XGBoost achieved an accuracy of 95.4 %, while the Gradient Boosting Classifier reached 92.3 %. For the regression prediction of band gaps and formation energies, both XGBoost and Gradient Boosting Regressor attained coefficients of determination (R2) close to 0.90. Furthermore, SHapley Additive exPlanations (SHAP) analysis provided interpretability by identifying dominant features influencing each property. The bandgap was primarily governed by the average number of d-orbital valence electrons, the proportion of s-orbital valence electrons, oxygen content (variable only in 2D oxides), and average atomic mass. In contrast, formation energy exhibited strong correlations with the electronegativity range, oxygen content in 2D oxides, and average d-orbital valence electron count. This study offers a robust and interpretable predictive approach for accelerating the screening and rational design of 2D TMOs, potentially reducing computational costs in high-throughput materials discovery workflows.
期刊介绍:
Physica B: Condensed Matter comprises all condensed matter and material physics that involve theoretical, computational and experimental work.
Papers should contain further developments and a proper discussion on the physics of experimental or theoretical results in one of the following areas:
-Magnetism
-Materials physics
-Nanostructures and nanomaterials
-Optics and optical materials
-Quantum materials
-Semiconductors
-Strongly correlated systems
-Superconductivity
-Surfaces and interfaces