Accelerated Chemical Space Generation of High Molar Extinction Organic Sensitizers via Machine Learning.

IF 3.1 4区 化学 Q2 BIOCHEMICAL RESEARCH METHODS
Sadaf Noreen, Mamduh J Aljaafreh
{"title":"Accelerated Chemical Space Generation of High Molar Extinction Organic Sensitizers via Machine Learning.","authors":"Sadaf Noreen, Mamduh J Aljaafreh","doi":"10.1007/s10895-025-04540-3","DOIUrl":null,"url":null,"abstract":"<p><p>The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R<sup>2</sup> ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.</p>","PeriodicalId":15800,"journal":{"name":"Journal of Fluorescence","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Fluorescence","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s10895-025-04540-3","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R2 ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.

利用机器学习加速高摩尔消光有机敏化剂的化学空间生成。
高摩尔消光(ε)系数有机敏化剂的开发对于各种光吸收应用具有重要意义。为了加速发现这些化合物,机器学习(ML)分析已被应用于探索其广阔的化学空间。通过设计电子、拓扑和分子描述符来预测其ε,对676个有机发色团的数据集进行了分析。在10个被测试的ML模型中,梯度增强、随机森林、额外树和历史梯度增强回归量与其实验值和预测值具有良好的相关性(R2≈0.70)。它们的Shapley特征重要性揭示了二次碳氢子图密度(SdsCH)和分配系数的对数-和Der Waals表面积描述符8 (SlogP_VSA8)描述符对模型性能有显著影响。此外,利用断裂反合成分析,合成了3288个具有潜在高ε的新结构,并通过降维分析验证了它们的可行性。他们的合成可达性(SA)计算确定了未来实验合成的顶层结构。有趣的是,研究结果表明,smile长度为35-80单位的新结构可以表现出最高的SA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Fluorescence
Journal of Fluorescence 化学-分析化学
CiteScore
4.60
自引率
7.40%
发文量
203
审稿时长
5.4 months
期刊介绍: Journal of Fluorescence is an international forum for the publication of peer-reviewed original articles that advance the practice of this established spectroscopic technique. Topics covered include advances in theory/and or data analysis, studies of the photophysics of aromatic molecules, solvent, and environmental effects, development of stationary or time-resolved measurements, advances in fluorescence microscopy, imaging, photobleaching/recovery measurements, and/or phosphorescence for studies of cell biology, chemical biology and the advanced uses of fluorescence in flow cytometry/analysis, immunology, high throughput screening/drug discovery, DNA sequencing/arrays, genomics and proteomics. Typical applications might include studies of macromolecular dynamics and conformation, intracellular chemistry, and gene expression. The journal also publishes papers that describe the synthesis and characterization of new fluorophores, particularly those displaying unique sensitivities and/or optical properties. In addition to original articles, the Journal also publishes reviews, rapid communications, short communications, letters to the editor, topical news articles, and technical and design notes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信