High-precision prediction of fluorescence wavelength of organic based on ensemble automatic machine learning method and online querying

IF 4.1 3区 工程技术 Q2 CHEMISTRY, APPLIED
Shao Zhao , Jiadong Li , Lingjun Wu , Xiaoyan Zheng , Anping Tang , Wanqiang Liu
{"title":"High-precision prediction of fluorescence wavelength of organic based on ensemble automatic machine learning method and online querying","authors":"Shao Zhao ,&nbsp;Jiadong Li ,&nbsp;Lingjun Wu ,&nbsp;Xiaoyan Zheng ,&nbsp;Anping Tang ,&nbsp;Wanqiang Liu","doi":"10.1016/j.dyepig.2025.113012","DOIUrl":null,"url":null,"abstract":"<div><div>Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (<em>λ</em><sub>em</sub>) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum <em>λ</em><sub>em</sub> were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing <em>λ</em><sub>em</sub>, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (<span><span>https://predixct-ednk9cynnprgqjbmskl95f.streamlit.app</span><svg><path></path></svg></span>), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.</div></div>","PeriodicalId":302,"journal":{"name":"Dyes and Pigments","volume":"242 ","pages":"Article 113012"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dyes and Pigments","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143720825003821","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (λem) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum λem were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing λem, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (https://predixct-ednk9cynnprgqjbmskl95f.streamlit.app), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.
基于集成自动机器学习和在线查询的有机材料荧光波长高精度预测
有机荧光在生物医学成像、化学传感、环境监测等领域有着广泛的应用。然而,传统的试错法测量有机荧光分子的波长既耗时又费力。集成自动机器学习(AutoML)方法为评价有机物的荧光性质提供了一种方便的方法。在这项工作中,我们构建了一个包含24798个有机荧光化合物的综合荧光数据库。这些化合物的最大发射波长(λem)在240 ~ 1200 nm之间。该数据库是根据最近同行评议的出版物建立的。分子结构被标准化,重复条目被删除。这个数据集被用于机器学习和建立预测模型。在AutoGluon建立的荧光最大λem预测模型中,WeightedEnsemble_L2模型表现最好,测试的平均绝对误差(MAE)为10 nm。Shapley加性解释(SHAP)分析揭示了控制λem的关键分子描述符,为分子工程提供了可行的见解。该模型被部署为一个开放访问的网络平台(https://predixct-ednk9cynnprgqjbmskl95f.streamlit.app),能够快速筛选光电子和传感应用的荧光团。这项工作弥补了数据驱动设计和实验合成之间的差距,为加速开发用于化学传感、生物成像和光学诊断的定制荧光探针提供了一个强大的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Dyes and Pigments
Dyes and Pigments 工程技术-材料科学:纺织
CiteScore
8.20
自引率
13.30%
发文量
933
审稿时长
33 days
期刊介绍: Dyes and Pigments covers the scientific and technical aspects of the chemistry and physics of dyes, pigments and their intermediates. Emphasis is placed on the properties of the colouring matters themselves rather than on their applications or the system in which they may be applied. Thus the journal accepts research and review papers on the synthesis of dyes, pigments and intermediates, their physical or chemical properties, e.g. spectroscopic, surface, solution or solid state characteristics, the physical aspects of their preparation, e.g. precipitation, nucleation and growth, crystal formation, liquid crystalline characteristics, their photochemical, ecological or biological properties and the relationship between colour and chemical constitution. However, papers are considered which deal with the more fundamental aspects of colourant application and of the interactions of colourants with substrates or media. The journal will interest a wide variety of workers in a range of disciplines whose work involves dyes, pigments and their intermediates, and provides a platform for investigators with common interests but diverse fields of activity such as cosmetics, reprographics, dye and pigment synthesis, medical research, polymers, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信