Shao Zhao , Jiadong Li , Lingjun Wu , Xiaoyan Zheng , Anping Tang , Wanqiang Liu
{"title":"High-precision prediction of fluorescence wavelength of organic based on ensemble automatic machine learning method and online querying","authors":"Shao Zhao , Jiadong Li , Lingjun Wu , Xiaoyan Zheng , Anping Tang , Wanqiang Liu","doi":"10.1016/j.dyepig.2025.113012","DOIUrl":null,"url":null,"abstract":"<div><div>Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (<em>λ</em><sub>em</sub>) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum <em>λ</em><sub>em</sub> were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing <em>λ</em><sub>em</sub>, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (<span><span>https://predixct-ednk9cynnprgqjbmskl95f.streamlit.app</span><svg><path></path></svg></span>), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.</div></div>","PeriodicalId":302,"journal":{"name":"Dyes and Pigments","volume":"242 ","pages":"Article 113012"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dyes and Pigments","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143720825003821","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Organic fluorescence is extensively applied in biomedical imaging, chemical sensing, and environmental monitoring etc. However, the traditional trial-and-error method for measuring the wavelength of organic fluorescent molecules is both time-consuming and labour-intensive. Ensemble automated machine learning (AutoML) methods provide a convenient way to evaluate the fluorescence properties of organics. In this work, we constructed a comprehensive fluorescence database containing 24798 organic fluorescent compounds. The maximum emission wavelengths (λem) of these compounds range from 240 nm to 1200 nm. The database was built based on recent peer-reviewed publications. Molecular structures were standardized, and duplicate entries were removed. This dataset were used for machine learning and to build predictive models. Among the prediction models for fluorescence maximum λem were built using the AutoGluon, the WeightedEnsemble_L2 model performed the best, with a mean absolute error (MAE) of 10 nm on the testing. Shapley additive explanation (SHAP) analysis revealed critical molecular descriptors governing λem, offering actionable insights for molecular engineering. The model was deployed as an open-access web platform (https://predixct-ednk9cynnprgqjbmskl95f.streamlit.app), enabling rapid screening of fluorophores for optoelectronic and sensing applications. This work bridges the gap between data-driven design and experimental synthesis, providing a robust tool to accelerate the development of tailored fluorescent probes for chemical sensing, bioimaging, and optical diagnostics.
期刊介绍:
Dyes and Pigments covers the scientific and technical aspects of the chemistry and physics of dyes, pigments and their intermediates. Emphasis is placed on the properties of the colouring matters themselves rather than on their applications or the system in which they may be applied.
Thus the journal accepts research and review papers on the synthesis of dyes, pigments and intermediates, their physical or chemical properties, e.g. spectroscopic, surface, solution or solid state characteristics, the physical aspects of their preparation, e.g. precipitation, nucleation and growth, crystal formation, liquid crystalline characteristics, their photochemical, ecological or biological properties and the relationship between colour and chemical constitution. However, papers are considered which deal with the more fundamental aspects of colourant application and of the interactions of colourants with substrates or media.
The journal will interest a wide variety of workers in a range of disciplines whose work involves dyes, pigments and their intermediates, and provides a platform for investigators with common interests but diverse fields of activity such as cosmetics, reprographics, dye and pigment synthesis, medical research, polymers, etc.