Wenxiang Song, Le Xiong, Xinmin Li, Yuyang Zhang, Binya Wang, Guixia Liu, Weihua Li, Youjun Yang* and Yun Tang*,
{"title":"Fluor-Predictor: An Interpretable Tool for Multiproperty Prediction and Retrieval of Fluorescent Dyes","authors":"Wenxiang Song, Le Xiong, Xinmin Li, Yuyang Zhang, Binya Wang, Guixia Liu, Weihua Li, Youjun Yang* and Yun Tang*, ","doi":"10.1021/acs.jcim.5c0012710.1021/acs.jcim.5c00127","DOIUrl":null,"url":null,"abstract":"<p >With the rapid advancements in the field of fluorescent dyes, accurate prediction of optical properties and efficient retrieval of dye-related data are essential for effective dye design. However, there is a lack of tools for comprehensive data integration and convenient data retrieval. Moreover, existing prediction models mainly focus on a single property of fluorescent dyes and fail to account for the diverse fluorophores and solutions in a systematic manner. To address this, we proposed Fluor-predictor, a multitask prediction model for fluorophores. This study integrates multiple dye databases and develops an interpretable graph neural network-based multitask regression model to predict four key optical properties of fluorescent dyes. We thoroughly examined the impact of factors such as data quality and the number of solvents on model performance. By leveraging atomic weight contributions, the model not only predicts these properties but also provides insights to guide structural modifications. In addition, we compiled and built a comprehensive database containing 36,756 records of fluorescence properties. To address the limitations of existing models in accurate prediction of Xanthene and Cyanine dyes, we then compiled 1148 Xanthene dye records and 1496 Cyanine dye records from the literature, comparing direct training with transfer learning approaches. The model achieved mean absolute errors (MAE) of 11.70 nm, 15.37 nm, 0.096, and 0.091 for predicting absorption wavelength (λ<sub>abs</sub>), emission wavelength (λ<sub>em</sub>), quantum yield (Φ) and molar extinction coefficient (Log(ε)), respectively. We integrated this work into a tool, Fluor-predictor, which supports comprehensive retrieval methods and multiproperty prediction. Fluor-predictor will facilitate data retrieval, prescreening, and structural modification of dyes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2854–2867 2854–2867"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00127","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid advancements in the field of fluorescent dyes, accurate prediction of optical properties and efficient retrieval of dye-related data are essential for effective dye design. However, there is a lack of tools for comprehensive data integration and convenient data retrieval. Moreover, existing prediction models mainly focus on a single property of fluorescent dyes and fail to account for the diverse fluorophores and solutions in a systematic manner. To address this, we proposed Fluor-predictor, a multitask prediction model for fluorophores. This study integrates multiple dye databases and develops an interpretable graph neural network-based multitask regression model to predict four key optical properties of fluorescent dyes. We thoroughly examined the impact of factors such as data quality and the number of solvents on model performance. By leveraging atomic weight contributions, the model not only predicts these properties but also provides insights to guide structural modifications. In addition, we compiled and built a comprehensive database containing 36,756 records of fluorescence properties. To address the limitations of existing models in accurate prediction of Xanthene and Cyanine dyes, we then compiled 1148 Xanthene dye records and 1496 Cyanine dye records from the literature, comparing direct training with transfer learning approaches. The model achieved mean absolute errors (MAE) of 11.70 nm, 15.37 nm, 0.096, and 0.091 for predicting absorption wavelength (λabs), emission wavelength (λem), quantum yield (Φ) and molar extinction coefficient (Log(ε)), respectively. We integrated this work into a tool, Fluor-predictor, which supports comprehensive retrieval methods and multiproperty prediction. Fluor-predictor will facilitate data retrieval, prescreening, and structural modification of dyes.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.