可解释的机器学习有助于发现具有最佳带隙的目标有机金属化合物

IF 8.1 2区 材料科学 Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY
Taehyun Park, JunHo Song, Jinyoung Jeong, Seungpyo Kang, Joonchul Kim, Joonghee Won, Jungim Han, Kyoungmin Min
{"title":"可解释的机器学习有助于发现具有最佳带隙的目标有机金属化合物","authors":"Taehyun Park, JunHo Song, Jinyoung Jeong, Seungpyo Kang, Joonchul Kim, Joonghee Won, Jungim Han, Kyoungmin Min","doi":"10.1016/j.mtadv.2024.100520","DOIUrl":null,"url":null,"abstract":"Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs’ properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.","PeriodicalId":48495,"journal":{"name":"Materials Today Advances","volume":"122 1","pages":""},"PeriodicalIF":8.1000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable machine learning boosting the discovery of targeted organometallic compounds with optimal bandgap\",\"authors\":\"Taehyun Park, JunHo Song, Jinyoung Jeong, Seungpyo Kang, Joonchul Kim, Joonghee Won, Jungim Han, Kyoungmin Min\",\"doi\":\"10.1016/j.mtadv.2024.100520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs’ properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.\",\"PeriodicalId\":48495,\"journal\":{\"name\":\"Materials Today Advances\",\"volume\":\"122 1\",\"pages\":\"\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Materials Today Advances\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1016/j.mtadv.2024.100520\",\"RegionNum\":2,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Today Advances","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.mtadv.2024.100520","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

有机金属化合物(OMC)因其有益的特性,在光伏电池和高介电常数应用等多个领域引起了极大的关注。尽管 OMC 具有巨大潜力,但由于有关其特性的数据库有限以及缺乏有效的替代模型,OMC 在工业应用中的发展受到了阻碍。为解决这一问题,本研究通过各种多尺度特征和广泛的数据库,构建了基于特征的优化代用模型,用于预测 OMC 的电子特性。为此,我们进行了高通量计算,获得了超过 18K 种一般称为有机金属的材料的电子特性,并增加了从公共开放数据集 OMDB-GAP1 中获得的约 12K 种有机材料。为了生成与 OMC 密切相关的特征,我们使用了囊括从局部到全局信息的描述符,以及其他广泛使用的基于成分和结构的特征(总共超过 3.5 千个)。在这些描述符中,我们确定了 48 个关键特征,这些特征阐明了 OMC 特性的物理化学基础,表明了它们对 OMC 特性的影响。在整个数据库中,轻型梯度提升机模型仅使用了描述符总数的 1%,就实现了高精度预测,足以与整套模型相媲美(R 分数降低了约 0.01,MAE 降低了 0.01 eV)。此外,主动学习过程在快速找到具有最佳属性的 OMC 方面的功效也得到了证明。因此,预期的改进效果优于其他方法,只需搜索总搜索空间的 46%,就能识别 69% 的目标材料。我们利用高通量计算数据库构建的平台,可为快速筛选目标工业应用的 OMC 铺平道路,并有助于全面掌握 OMC 和相关化合物的内在特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpretable machine learning boosting the discovery of targeted organometallic compounds with optimal bandgap
Organometallic compounds (OMCs) have attracted tremendous attention in various fields, such as photovoltaic cell and high-k dielectric application, due to their beneficial properties. Despite their potential, the progression of OMCs into industrial applications is hindered by the limited databases available for their properties and the absence of efficient surrogate models. To address this, in this study, optimally selected feature-based surrogate models for predicting the electronic properties of OMCs are constructed via various multiscale features and extensive database. To this end, high-throughput calculation was performed to obtain electronic properties of more than 18k materials generally known as organometallics, augmenting around 12k organic materials obtained from the public open data set, OMDB-GAP1. For generating features closely related to OMCs, descriptors encapsulating the information ranging local to global, also other widely-used composition-, structure-based features (more than 3.5k in total) were employed. Among these descriptors, we identified 48 critical features that elucidates the physicochemical underpinnings of OMCs’ properties, suggesting their impact on the properties of OMCs. The light gradient boosting machine model achieved high-accuracy predictions across the entire database with just 1 % of the total descriptors, sufficiently compared to the entire sets (decreased of around 0.01 by R score and 0.01 eV by MAE). Furthermore, the efficacy of active learning process was demonstrated to find OMCs with optimal properties rapidly. As a result, expected improvement outperforms other methods by identifying 69 % of the target materials only searching 46 % of the total search space. Our constructed platform with a high-throughput calculated database can pave the way for the rapid screening of OMCs for the targeted industrial application, and suggest a comprehensive grasp of the intrinsic properties of OMCs and related compounds.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Materials Today Advances
Materials Today Advances MATERIALS SCIENCE, MULTIDISCIPLINARY-
CiteScore
14.30
自引率
2.00%
发文量
116
审稿时长
32 days
期刊介绍: Materials Today Advances is a multi-disciplinary, open access journal that aims to connect different communities within materials science. It covers all aspects of materials science and related disciplines, including fundamental and applied research. The focus is on studies with broad impact that can cross traditional subject boundaries. The journal welcomes the submissions of articles at the forefront of materials science, advancing the field. It is part of the Materials Today family and offers authors rigorous peer review, rapid decisions, and high visibility.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信