Zixiong Sun, Ruyue Gao, Ping Wang, Xinying Liu, Yujie Bai, Jingyu Luo, Hongyu Yang and Wanbiao Hu
{"title":"基于变分自编码器的数据增强加速多组分铁电体剩余极化预测","authors":"Zixiong Sun, Ruyue Gao, Ping Wang, Xinying Liu, Yujie Bai, Jingyu Luo, Hongyu Yang and Wanbiao Hu","doi":"10.1039/D5TC01781E","DOIUrl":null,"url":null,"abstract":"<p >As potential next-generation power systems, ferroelectric capacitors have been thus widely studied, and artificial intelligence (AI) is becoming an efficient tool for searching new systems. As a key parameter that directly affects the energy storage density (<em>W</em><small><sub>rec</sub></small>) of capacitors, obtaining low remanent polarization (<em>P</em><small><sub>r</sub></small>) is important. To enhance the processing of high-dimensional and nonlinear data and to predict key parameters, this study employs a strategy that integrates data augmentation with feature selection. Based on the atomic structure, electronic configuration, and crystal structure of (K<small><sub>1−<em>x</em>−<em>y</em>−<em>z</em></sub></small>Na<small><sub><em>x</em></sub></small>Ba<small><sub><em>y</em></sub></small>Ca<small><sub><em>z</em></sub></small>)(Nb<small><sub>1−<em>u</em>−<em>v</em>−<em>w</em></sub></small>Zr<small><sub><em>u</em></sub></small>Ti<small><sub><em>v</em></sub></small>)O<small><sub>3</sub></small>, we selected 46 initial features. Subsequently, using a conditional variational autoencoder (CVAE), we synthesized 20 000 new data points from 234 original samples to expand the dataset and verify the credibility of the generated data. Finally, through a machine learning strategy, multiple algorithm models were established for training and prediction <em>P</em><small><sub>r</sub></small>; the determination coefficient (<em>R</em><small><sup>2</sup></small>) of the XGBoost (XGB) model was 0.94 for training and predicting <em>P</em><small><sub>r</sub></small>, and through a series of feature selection processes, ultimately four kinds of key descriptors that affect <em>P</em><small><sub>r</sub></small> were identified: Matyonov–Batsanov electronegativity, Shannon ionic radius, tolerance factor, and core electron distance (Schubert) of A-site elements. The model accurately predicted the properties of two ceramic systems, including samples with elements beyond the original input space, and the model still showed strong predictive ability. This study not only offers valuable insights for enriching sparse datasets in materials science <em>via</em> data augmentation but also demonstrates an effective strategy for accelerating the prediction of remnant polarization in complex ferroelectric systems.</p>","PeriodicalId":84,"journal":{"name":"Journal of Materials Chemistry C","volume":" 32","pages":" 16551-16561"},"PeriodicalIF":5.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating the prediction of remanent polarization in multicomponent ferroelectrics by using variational autoencoder-based data augmentation†\",\"authors\":\"Zixiong Sun, Ruyue Gao, Ping Wang, Xinying Liu, Yujie Bai, Jingyu Luo, Hongyu Yang and Wanbiao Hu\",\"doi\":\"10.1039/D5TC01781E\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >As potential next-generation power systems, ferroelectric capacitors have been thus widely studied, and artificial intelligence (AI) is becoming an efficient tool for searching new systems. As a key parameter that directly affects the energy storage density (<em>W</em><small><sub>rec</sub></small>) of capacitors, obtaining low remanent polarization (<em>P</em><small><sub>r</sub></small>) is important. To enhance the processing of high-dimensional and nonlinear data and to predict key parameters, this study employs a strategy that integrates data augmentation with feature selection. Based on the atomic structure, electronic configuration, and crystal structure of (K<small><sub>1−<em>x</em>−<em>y</em>−<em>z</em></sub></small>Na<small><sub><em>x</em></sub></small>Ba<small><sub><em>y</em></sub></small>Ca<small><sub><em>z</em></sub></small>)(Nb<small><sub>1−<em>u</em>−<em>v</em>−<em>w</em></sub></small>Zr<small><sub><em>u</em></sub></small>Ti<small><sub><em>v</em></sub></small>)O<small><sub>3</sub></small>, we selected 46 initial features. Subsequently, using a conditional variational autoencoder (CVAE), we synthesized 20 000 new data points from 234 original samples to expand the dataset and verify the credibility of the generated data. Finally, through a machine learning strategy, multiple algorithm models were established for training and prediction <em>P</em><small><sub>r</sub></small>; the determination coefficient (<em>R</em><small><sup>2</sup></small>) of the XGBoost (XGB) model was 0.94 for training and predicting <em>P</em><small><sub>r</sub></small>, and through a series of feature selection processes, ultimately four kinds of key descriptors that affect <em>P</em><small><sub>r</sub></small> were identified: Matyonov–Batsanov electronegativity, Shannon ionic radius, tolerance factor, and core electron distance (Schubert) of A-site elements. The model accurately predicted the properties of two ceramic systems, including samples with elements beyond the original input space, and the model still showed strong predictive ability. This study not only offers valuable insights for enriching sparse datasets in materials science <em>via</em> data augmentation but also demonstrates an effective strategy for accelerating the prediction of remnant polarization in complex ferroelectric systems.</p>\",\"PeriodicalId\":84,\"journal\":{\"name\":\"Journal of Materials Chemistry C\",\"volume\":\" 32\",\"pages\":\" 16551-16561\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Materials Chemistry C\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/tc/d5tc01781e\",\"RegionNum\":2,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Materials Chemistry C","FirstCategoryId":"1","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/tc/d5tc01781e","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
Accelerating the prediction of remanent polarization in multicomponent ferroelectrics by using variational autoencoder-based data augmentation†
As potential next-generation power systems, ferroelectric capacitors have been thus widely studied, and artificial intelligence (AI) is becoming an efficient tool for searching new systems. As a key parameter that directly affects the energy storage density (Wrec) of capacitors, obtaining low remanent polarization (Pr) is important. To enhance the processing of high-dimensional and nonlinear data and to predict key parameters, this study employs a strategy that integrates data augmentation with feature selection. Based on the atomic structure, electronic configuration, and crystal structure of (K1−x−y−zNaxBayCaz)(Nb1−u−v−wZruTiv)O3, we selected 46 initial features. Subsequently, using a conditional variational autoencoder (CVAE), we synthesized 20 000 new data points from 234 original samples to expand the dataset and verify the credibility of the generated data. Finally, through a machine learning strategy, multiple algorithm models were established for training and prediction Pr; the determination coefficient (R2) of the XGBoost (XGB) model was 0.94 for training and predicting Pr, and through a series of feature selection processes, ultimately four kinds of key descriptors that affect Pr were identified: Matyonov–Batsanov electronegativity, Shannon ionic radius, tolerance factor, and core electron distance (Schubert) of A-site elements. The model accurately predicted the properties of two ceramic systems, including samples with elements beyond the original input space, and the model still showed strong predictive ability. This study not only offers valuable insights for enriching sparse datasets in materials science via data augmentation but also demonstrates an effective strategy for accelerating the prediction of remnant polarization in complex ferroelectric systems.
期刊介绍:
The Journal of Materials Chemistry is divided into three distinct sections, A, B, and C, each catering to specific applications of the materials under study:
Journal of Materials Chemistry A focuses primarily on materials intended for applications in energy and sustainability.
Journal of Materials Chemistry B specializes in materials designed for applications in biology and medicine.
Journal of Materials Chemistry C is dedicated to materials suitable for applications in optical, magnetic, and electronic devices.
Example topic areas within the scope of Journal of Materials Chemistry C are listed below. This list is neither exhaustive nor exclusive.
Bioelectronics
Conductors
Detectors
Dielectrics
Displays
Ferroelectrics
Lasers
LEDs
Lighting
Liquid crystals
Memory
Metamaterials
Multiferroics
Photonics
Photovoltaics
Semiconductors
Sensors
Single molecule conductors
Spintronics
Superconductors
Thermoelectrics
Topological insulators
Transistors