基于机器学习和密度泛函理论的导电聚合物带隙和重组能预测

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Tugba Haciefendioglu,  and , Erol Yildirim*, 
{"title":"基于机器学习和密度泛函理论的导电聚合物带隙和重组能预测","authors":"Tugba Haciefendioglu,&nbsp; and ,&nbsp;Erol Yildirim*,&nbsp;","doi":"10.1021/acs.jcim.5c0034510.1021/acs.jcim.5c00345","DOIUrl":null,"url":null,"abstract":"<p >The performance and reliability of machine learning (ML)-quantitative structure–property relationship (QSPR) models depend on the quality, size, and diversity of the data set used for model training. In this study, we manually curated a large-scale data set containing 3120 donor–acceptor (D–A) conjugated polymers (CPs) by selecting the most utilized 60 donors and 52 acceptors. This data set serves as a valuable resource for ML-based prediction of key electronic properties such as band gap energy (<i>E</i><sub>g</sub>) and hole reorganization energy (λ<sub>h</sub>), calculated using density functional theory (DFT) to advance organic photovoltaics (OPV). Beyond data set construction, we systematically investigated how different descriptor and fingerprint types impact performance of the ML model. Recognizing that not all features contributed equally to the model performance, we conducted an in-depth analysis to identify the most informative descriptors for the fundamental optoelectronic properties. Our findings show that kernel partial least-squares (KPLS) regression utilizing radial and molprint2D fingerprints achieved the highest accuracy in predicting <i>E</i><sub>g</sub>, with <i>R</i><sup>2</sup> values of 0.899 and 0.897, respectively. For λ<sub>h</sub> prediction, models integrating electronic descriptors such as frontier orbital energy levels significantly improved performance, achieving an <i>R</i><sup>2</sup> value of 0.830. This study provides a comprehensive investigation of how different descriptors influence model performance in OPV research. By analyzing why certain models succeed while others fail, our findings offer insight into feature selection and data set optimization for accurate target property prediction in organic electronics. The developed ML models provide a predictive framework for high-performance OPV materials design, significantly reducing the reliance on labor-intensive experimental procedures and computationally expensive first-principle calculations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 11","pages":"5360–5369 5360–5369"},"PeriodicalIF":5.3000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.5c00345","citationCount":"0","resultStr":"{\"title\":\"Band Gap and Reorganization Energy Prediction of Conducting Polymers by the Integration of Machine Learning and Density Functional Theory\",\"authors\":\"Tugba Haciefendioglu,&nbsp; and ,&nbsp;Erol Yildirim*,&nbsp;\",\"doi\":\"10.1021/acs.jcim.5c0034510.1021/acs.jcim.5c00345\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >The performance and reliability of machine learning (ML)-quantitative structure–property relationship (QSPR) models depend on the quality, size, and diversity of the data set used for model training. In this study, we manually curated a large-scale data set containing 3120 donor–acceptor (D–A) conjugated polymers (CPs) by selecting the most utilized 60 donors and 52 acceptors. This data set serves as a valuable resource for ML-based prediction of key electronic properties such as band gap energy (<i>E</i><sub>g</sub>) and hole reorganization energy (λ<sub>h</sub>), calculated using density functional theory (DFT) to advance organic photovoltaics (OPV). Beyond data set construction, we systematically investigated how different descriptor and fingerprint types impact performance of the ML model. Recognizing that not all features contributed equally to the model performance, we conducted an in-depth analysis to identify the most informative descriptors for the fundamental optoelectronic properties. Our findings show that kernel partial least-squares (KPLS) regression utilizing radial and molprint2D fingerprints achieved the highest accuracy in predicting <i>E</i><sub>g</sub>, with <i>R</i><sup>2</sup> values of 0.899 and 0.897, respectively. For λ<sub>h</sub> prediction, models integrating electronic descriptors such as frontier orbital energy levels significantly improved performance, achieving an <i>R</i><sup>2</sup> value of 0.830. This study provides a comprehensive investigation of how different descriptors influence model performance in OPV research. By analyzing why certain models succeed while others fail, our findings offer insight into feature selection and data set optimization for accurate target property prediction in organic electronics. The developed ML models provide a predictive framework for high-performance OPV materials design, significantly reducing the reliance on labor-intensive experimental procedures and computationally expensive first-principle calculations.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\"65 11\",\"pages\":\"5360–5369 5360–5369\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.5c00345\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00345\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00345","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

机器学习(ML)-定量结构-属性关系(QSPR)模型的性能和可靠性取决于用于模型训练的数据集的质量、大小和多样性。在这项研究中,我们通过选择最常用的60个供体和52个受体,手动策划了一个包含3120个供体-受体(D-A)共轭聚合物(CPs)的大规模数据集。该数据集为基于ml的关键电子性质预测提供了宝贵的资源,如带隙能(Eg)和空穴重组能(λh),利用密度泛函理论(DFT)计算,以推进有机光伏(OPV)。除了数据集的构建,我们系统地研究了不同的描述符和指纹类型如何影响机器学习模型的性能。认识到并非所有特征对模型性能的贡献都是相同的,我们进行了深入的分析,以确定最具信息量的基本光电特性描述符。结果表明,径向指纹和molprint2D指纹的核偏最小二乘(KPLS)回归预测Eg的准确率最高,R2分别为0.899和0.897。对于λh预测,集成前沿轨道能级等电子描述符的模型显著提高了预测性能,R2值为0.830。本研究全面探讨了不同描述符对OPV研究中模型性能的影响。通过分析某些模型成功而其他模型失败的原因,我们的发现为有机电子学中准确的目标属性预测提供了特征选择和数据集优化的见解。开发的机器学习模型为高性能OPV材料设计提供了预测框架,大大减少了对劳动密集型实验程序和计算昂贵的第一原理计算的依赖。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Band Gap and Reorganization Energy Prediction of Conducting Polymers by the Integration of Machine Learning and Density Functional Theory

The performance and reliability of machine learning (ML)-quantitative structure–property relationship (QSPR) models depend on the quality, size, and diversity of the data set used for model training. In this study, we manually curated a large-scale data set containing 3120 donor–acceptor (D–A) conjugated polymers (CPs) by selecting the most utilized 60 donors and 52 acceptors. This data set serves as a valuable resource for ML-based prediction of key electronic properties such as band gap energy (Eg) and hole reorganization energy (λh), calculated using density functional theory (DFT) to advance organic photovoltaics (OPV). Beyond data set construction, we systematically investigated how different descriptor and fingerprint types impact performance of the ML model. Recognizing that not all features contributed equally to the model performance, we conducted an in-depth analysis to identify the most informative descriptors for the fundamental optoelectronic properties. Our findings show that kernel partial least-squares (KPLS) regression utilizing radial and molprint2D fingerprints achieved the highest accuracy in predicting Eg, with R2 values of 0.899 and 0.897, respectively. For λh prediction, models integrating electronic descriptors such as frontier orbital energy levels significantly improved performance, achieving an R2 value of 0.830. This study provides a comprehensive investigation of how different descriptors influence model performance in OPV research. By analyzing why certain models succeed while others fail, our findings offer insight into feature selection and data set optimization for accurate target property prediction in organic electronics. The developed ML models provide a predictive framework for high-performance OPV materials design, significantly reducing the reliance on labor-intensive experimental procedures and computationally expensive first-principle calculations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信