预测BODIPY化合物光物理性质的基于化学特征的机器学习模型:密度泛函理论和定量结构-性质关系建模

IF 2.1 4区 化学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Gerardo M. Casanola-Martin, Jing Wang, Jian-ge Zhou, Bakhtiyor Rasulev, Jerzy Leszczynski
{"title":"预测BODIPY化合物光物理性质的基于化学特征的机器学习模型:密度泛函理论和定量结构-性质关系建模","authors":"Gerardo M. Casanola-Martin,&nbsp;Jing Wang,&nbsp;Jian-ge Zhou,&nbsp;Bakhtiyor Rasulev,&nbsp;Jerzy Leszczynski","doi":"10.1007/s00894-024-06240-4","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><p>Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (<i>R</i><sup>2</sup>) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.</p><h3>Methods</h3><p>In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.</p></div>","PeriodicalId":651,"journal":{"name":"Journal of Molecular Modeling","volume":"31 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure–property relationship modeling\",\"authors\":\"Gerardo M. Casanola-Martin,&nbsp;Jing Wang,&nbsp;Jian-ge Zhou,&nbsp;Bakhtiyor Rasulev,&nbsp;Jerzy Leszczynski\",\"doi\":\"10.1007/s00894-024-06240-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context</h3><p>Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (<i>R</i><sup>2</sup>) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.</p><h3>Methods</h3><p>In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.</p></div>\",\"PeriodicalId\":651,\"journal\":{\"name\":\"Journal of Molecular Modeling\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Molecular Modeling\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00894-024-06240-4\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Modeling","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s00894-024-06240-4","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

硼-二吡咯烷(BODIPY)化合物具有独特的光物理性质,在荧光成像、传感、光电子学等领域有着广泛的应用。为了设计有效的BODIPY化合物,全面了解BODIPY结构与相应光性质之间的关系至关重要。鉴定出15个分子描述符与最大吸收波长密切相关。所建立的ML/QSPR模型具有良好的预测性能,训练集的决定系数(R2)为0.945,测试集的决定系数(R2)为0.734,具有较好的鲁棒性和可靠性。对模型中选定的一些描述符的后验分析提供了对影响BODIPY化合物性质的结构特征的见解;同时,它也强调了分子分支、大小和特定官能团的重要性。该研究表明,化学信息学和机器学习相结合的方法在筛选BODIPY化合物和设计具有增强性能的新结构方面具有鲁棒性。方法在本研究中,对所研究的所有BODIPY模型进行了充分优化,获得了相应的DFT/TDDFT//B3LYP/6-311G(d,p)水平的吸收光谱。以上计算均由高斯16程序执行。在理论计算结果的基础上,采用基于机器学习的定量结构-性质关系(ML/QSPR)模型,结合手工分子描述符(MD)和可解释机器学习(EML)技术,利用Scikit-learn python库预测BODIPY化合物的最大吸收波长(λ)。使用Chemaxon和Alvadesc软件,利用131种BODIPY化合物及其实验光物理性质的数据集,生成了一组不同的分子描述符,捕获了这些化合物的大小、形状、连通性和其他结构特征的信息。采用遗传算法(GA)变量选择和多元线性回归(MLR)方法,利用遗传选择python库建立最佳预测模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure–property relationship modeling

Chemical feature-based machine learning model for predicting photophysical properties of BODIPY compounds: density functional theory and quantitative structure–property relationship modeling

Context

Boron-dipyrromethene (BODIPY) compounds have unique photophysical properties and have been applied in fluorescence imaging, sensing, optoelectronics, and beyond. In order to design effective BODIPY compounds, it is crucial to acquire a comprehensive understanding of the relationships between the structures of BODIPY and the corresponding photoproperties. Fifteen molecular descriptors were identified to be strongly correlated with the maximum absorption wavelength. The developed ML/QSPR model exhibited good predictive performance, with coefficients of determination (R2) of 0.945 for the training set and 0.734 for the test set, demonstrating robustness and reliability. A posterior analysis of some of the selected descriptors in the model provided insights into the structural features that influence BODIPY compound properties; meanwhile, it also emphasizes the importance of molecular branching, size, and specific functional groups. This work shows that applied combined cheminformatics and machine learning approach is robust to screen the BODIPY compounds and design novel structures with enhanced performance.

Methods

In the present study, all the BODIPY models studied were fully optimized, and the corresponding absorption spectrum was obtained at DFT/TDDFT//B3LYP/6-311G(d,p) level. All the above calculations were executed by the Gaussian 16 program. Based upon the theoretical computational results, the machine learning-based quantitative structure–property relationship (ML/QSPR) model was employed for predicting the maximum absorption wavelength (λ) of BODIPY compounds by combining hand-crafted molecular descriptors (MD) and explainable machine learning (EML) techniques using Scikit-learn python library. A dataset of 131 BODIPY compounds with their experimental photophysical properties was used to generate a diverse set of molecular descriptors capturing information about the size, shape, connectivity, and other structural features of these compounds using Chemaxon and Alvadesc software. A genetic algorithm (GA) variable selection together with the multi-linear regression (MLR) method were applied to develop the best predictive model using the Genetic Selection python library.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Molecular Modeling
Journal of Molecular Modeling 化学-化学综合
CiteScore
3.50
自引率
4.50%
发文量
362
审稿时长
2.9 months
期刊介绍: The Journal of Molecular Modeling focuses on "hardcore" modeling, publishing high-quality research and reports. Founded in 1995 as a purely electronic journal, it has adapted its format to include a full-color print edition, and adjusted its aims and scope fit the fast-changing field of molecular modeling, with a particular focus on three-dimensional modeling. Today, the journal covers all aspects of molecular modeling including life science modeling; materials modeling; new methods; and computational chemistry. Topics include computer-aided molecular design; rational drug design, de novo ligand design, receptor modeling and docking; cheminformatics, data analysis, visualization and mining; computational medicinal chemistry; homology modeling; simulation of peptides, DNA and other biopolymers; quantitative structure-activity relationships (QSAR) and ADME-modeling; modeling of biological reaction mechanisms; and combined experimental and computational studies in which calculations play a major role.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信