{"title":"Accelerated Chemical Space Generation of High Molar Extinction Organic Sensitizers via Machine Learning.","authors":"Sadaf Noreen, Mamduh J Aljaafreh","doi":"10.1007/s10895-025-04540-3","DOIUrl":null,"url":null,"abstract":"<p><p>The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R<sup>2</sup> ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.</p>","PeriodicalId":15800,"journal":{"name":"Journal of Fluorescence","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Fluorescence","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s10895-025-04540-3","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R2 ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.
期刊介绍:
Journal of Fluorescence is an international forum for the publication of peer-reviewed original articles that advance the practice of this established spectroscopic technique. Topics covered include advances in theory/and or data analysis, studies of the photophysics of aromatic molecules, solvent, and environmental effects, development of stationary or time-resolved measurements, advances in fluorescence microscopy, imaging, photobleaching/recovery measurements, and/or phosphorescence for studies of cell biology, chemical biology and the advanced uses of fluorescence in flow cytometry/analysis, immunology, high throughput screening/drug discovery, DNA sequencing/arrays, genomics and proteomics. Typical applications might include studies of macromolecular dynamics and conformation, intracellular chemistry, and gene expression. The journal also publishes papers that describe the synthesis and characterization of new fluorophores, particularly those displaying unique sensitivities and/or optical properties. In addition to original articles, the Journal also publishes reviews, rapid communications, short communications, letters to the editor, topical news articles, and technical and design notes.