Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder

Cheng Zeng, Zulqarnain Khan, Nathan L. Post
{"title":"Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder","authors":"Cheng Zeng, Zulqarnain Khan, Nathan L. Post","doi":"arxiv-2409.06740","DOIUrl":null,"url":null,"abstract":"Inverse materials design has proven successful in accelerating novel material\ndiscovery. Many inverse materials design methods use unsupervised learning\nwhere a latent space is learned to offer a compact description of materials\nrepresentations. A latent space learned this way is likely to be entangled, in\nterms of the target property and other properties of the materials. This makes\nthe inverse design process ambiguous. Here, we present a semi-supervised\nlearning approach based on a disentangled variational autoencoder to learn a\nprobabilistic relationship between features, latent variables and target\nproperties. This approach is data efficient because it combines all labelled\nand unlabelled data in a coherent manner, and it uses expert-informed prior\ndistributions to improve model robustness even with limited labelled data. It\nis in essence interpretable, as the learnable target property is disentangled\nout of the other properties of the materials, and an extra layer of\ninterpretability can be provided by a post-hoc analysis of the classification\nhead of the model. We demonstrate this new approach on an experimental\nhigh-entropy alloy dataset with chemical compositions as input and single-phase\nformation as the single target property. While single property is used in this\nwork, the disentangled model can be extended to customize for inverse design of\nmaterials with multiple target properties.","PeriodicalId":501234,"journal":{"name":"arXiv - PHYS - Materials Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Materials Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Inverse materials design has proven successful in accelerating novel material discovery. Many inverse materials design methods use unsupervised learning where a latent space is learned to offer a compact description of materials representations. A latent space learned this way is likely to be entangled, in terms of the target property and other properties of the materials. This makes the inverse design process ambiguous. Here, we present a semi-supervised learning approach based on a disentangled variational autoencoder to learn a probabilistic relationship between features, latent variables and target properties. This approach is data efficient because it combines all labelled and unlabelled data in a coherent manner, and it uses expert-informed prior distributions to improve model robustness even with limited labelled data. It is in essence interpretable, as the learnable target property is disentangled out of the other properties of the materials, and an extra layer of interpretability can be provided by a post-hoc analysis of the classification head of the model. We demonstrate this new approach on an experimental high-entropy alloy dataset with chemical compositions as input and single-phase formation as the single target property. While single property is used in this work, the disentangled model can be extended to customize for inverse design of materials with multiple target properties.
利用离散变异自动编码器实现数据高效、可解释的逆材料设计
事实证明,逆向材料设计可以成功加速新型材料的发现。许多逆向材料设计方法都使用无监督学习,通过学习潜空间来提供材料表征的紧凑描述。以这种方式学习到的潜在空间很可能与目标特性和材料的其他特性纠缠在一起。这使得逆向设计过程变得模糊不清。在这里,我们提出了一种半监督学习方法,该方法基于一个分散变异自动编码器来学习特征、潜变量和目标属性之间的概率关系。这种方法数据效率高,因为它以一种连贯的方式结合了所有标记数据和未标记数据,而且即使标记数据有限,它也能利用专家提供的先验分布来提高模型的鲁棒性。这种方法本质上是可解释的,因为可学习的目标属性与材料的其他属性是分离的,而且对模型分类头的事后分析还能提供额外的可解释性。我们在一个实验性高熵合金数据集上演示了这种新方法,该数据集以化学成分作为输入,以单相变作为单一目标属性。虽然这项工作中使用的是单一属性,但分解模型可以扩展到定制具有多种目标属性的材料的逆向设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信