Synergizing GA-XGBoost and QSAR modeling: Breaking down activity aliffs in HDAC1 inhibitors

IF 2.7 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Rahul D. Jawarkar , Suraj Mali , Prashant K. Deshmukh , Rahul G. Ingle , Sami A. Al-Hussain , Aamal A. Al-Mutairi , Magdi E.A. Zaki
{"title":"Synergizing GA-XGBoost and QSAR modeling: Breaking down activity aliffs in HDAC1 inhibitors","authors":"Rahul D. Jawarkar ,&nbsp;Suraj Mali ,&nbsp;Prashant K. Deshmukh ,&nbsp;Rahul G. Ingle ,&nbsp;Sami A. Al-Hussain ,&nbsp;Aamal A. Al-Mutairi ,&nbsp;Magdi E.A. Zaki","doi":"10.1016/j.jmgm.2024.108915","DOIUrl":null,"url":null,"abstract":"<div><div>The work being presented now combines severe gradient boosting with Shapley values, a thriving merger within the field of explainable artificial intelligence. We also use a genetic algorithm to analyse the HDAC1 inhibitory activity of a broad pool of 1274 molecules experimentally reported for HDAC1 inhibition. We conduct this analysis to ascertain the HDAC1 inhibitory activity of these molecules. Based on a rigorous investigation of extreme gradient boosting, the proposed method suggests using a genetic algorithm to identify pharmacophoric features. The statistical acceptability of extreme gradient boosting analysis is robust, with parameters such as R<sup>2</sup><sub>tr</sub> = 0.8797, R<sup>2</sup><sub>L10 %</sub> = 0.8831, Q<sup>2</sup><sub>F1</sub> = 0.9459, Q<sup>2</sup><sub>F2</sub> = 0.9452, and Q<sup>2</sup><sub>F3</sub> = 0.9474. This is the driving force behind the invention of nine Py-descriptor-containing genetic algorithms. Shapley additive explanations formed the basis for the interpretation, assigning a significant value to each variable in the model. This is followed by the use of counterfactual cases to analyse the impact of the discovered molecular descriptors on HDAC1 inhibition. An examination of the molecular descriptors, which include acc_N_3B, fsp2NringC8B, fsp3NC7B, and sp2N_sp3C_3B, demonstrates that these descriptors provide insight into the function that the nitrogen atom plays in influencing HDAC1's inhibitory activity. Furthermore, the investigation shed light on the significant role that the hybridized carbon atoms located in sp2 and sp3 play in HDAC1 inhibition. Thus, the QSAR results are in conformity with the reported findings. In addition, activity cliff analysis supports the QSAR findings. Thus, the genetic algorithm-extreme gradient-boosting GA-XGBoost model is easy to understand and makes decent predictions. Based on this, it indicates that \"explainable AI\" may prove to be beneficial in the future for the purpose of identifying and using structural features in the process of medication development.</div></div>","PeriodicalId":16361,"journal":{"name":"Journal of molecular graphics & modelling","volume":"135 ","pages":"Article 108915"},"PeriodicalIF":2.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of molecular graphics & modelling","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1093326324002158","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The work being presented now combines severe gradient boosting with Shapley values, a thriving merger within the field of explainable artificial intelligence. We also use a genetic algorithm to analyse the HDAC1 inhibitory activity of a broad pool of 1274 molecules experimentally reported for HDAC1 inhibition. We conduct this analysis to ascertain the HDAC1 inhibitory activity of these molecules. Based on a rigorous investigation of extreme gradient boosting, the proposed method suggests using a genetic algorithm to identify pharmacophoric features. The statistical acceptability of extreme gradient boosting analysis is robust, with parameters such as R2tr = 0.8797, R2L10 % = 0.8831, Q2F1 = 0.9459, Q2F2 = 0.9452, and Q2F3 = 0.9474. This is the driving force behind the invention of nine Py-descriptor-containing genetic algorithms. Shapley additive explanations formed the basis for the interpretation, assigning a significant value to each variable in the model. This is followed by the use of counterfactual cases to analyse the impact of the discovered molecular descriptors on HDAC1 inhibition. An examination of the molecular descriptors, which include acc_N_3B, fsp2NringC8B, fsp3NC7B, and sp2N_sp3C_3B, demonstrates that these descriptors provide insight into the function that the nitrogen atom plays in influencing HDAC1's inhibitory activity. Furthermore, the investigation shed light on the significant role that the hybridized carbon atoms located in sp2 and sp3 play in HDAC1 inhibition. Thus, the QSAR results are in conformity with the reported findings. In addition, activity cliff analysis supports the QSAR findings. Thus, the genetic algorithm-extreme gradient-boosting GA-XGBoost model is easy to understand and makes decent predictions. Based on this, it indicates that "explainable AI" may prove to be beneficial in the future for the purpose of identifying and using structural features in the process of medication development.

Abstract Image

GA-XGBoost和QSAR模型的协同作用:破坏HDAC1抑制剂的活性。
现在展示的工作将严重梯度增强与Shapley值结合在一起,这是可解释人工智能领域的一个蓬勃发展的合并。我们还使用遗传算法分析了实验报道的1274个HDAC1抑制分子的HDAC1抑制活性。我们进行这一分析,以确定这些分子的HDAC1抑制活性。基于对极端梯度增强的严格研究,该方法建议使用遗传算法来识别药效特征。极值梯度增强分析的统计可接受性较强,参数为R2tr = 0.8797, r2l10% = 0.8831, Q2F1 = 0.9459, Q2F2 = 0.9452, Q2F3 = 0.9474。这就是九种包含py描述符的遗传算法背后的驱动力。Shapley加性解释构成了解释的基础,赋予模型中的每个变量一个显著的值。随后使用反事实案例来分析发现的分子描述符对HDAC1抑制的影响。对包括acc_N_3B、fsp2NringC8B、fsp3NC7B和sp2N_sp3C_3B在内的分子描述符的研究表明,这些描述符提供了对氮原子影响HDAC1抑制活性的功能的深入了解。此外,该研究还揭示了位于sp2和sp3上的杂化碳原子在抑制HDAC1中的重要作用。因此,QSAR结果与报道的结果一致。此外,活动悬崖分析支持QSAR的发现。因此,遗传算法-极端梯度增强GA-XGBoost模型易于理解,并能做出不错的预测。基于此,这表明“可解释的AI”可能在未来被证明是有益的,目的是在药物开发过程中识别和使用结构特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of molecular graphics & modelling
Journal of molecular graphics & modelling 生物-计算机:跨学科应用
CiteScore
5.50
自引率
6.90%
发文量
216
审稿时长
35 days
期刊介绍: The Journal of Molecular Graphics and Modelling is devoted to the publication of papers on the uses of computers in theoretical investigations of molecular structure, function, interaction, and design. The scope of the journal includes all aspects of molecular modeling and computational chemistry, including, for instance, the study of molecular shape and properties, molecular simulations, protein and polymer engineering, drug design, materials design, structure-activity and structure-property relationships, database mining, and compound library design. As a primary research journal, JMGM seeks to bring new knowledge to the attention of our readers. As such, submissions to the journal need to not only report results, but must draw conclusions and explore implications of the work presented. Authors are strongly encouraged to bear this in mind when preparing manuscripts. Routine applications of standard modelling approaches, providing only very limited new scientific insight, will not meet our criteria for publication. Reproducibility of reported calculations is an important issue. Wherever possible, we urge authors to enhance their papers with Supplementary Data, for example, in QSAR studies machine-readable versions of molecular datasets or in the development of new force-field parameters versions of the topology and force field parameter files. Routine applications of existing methods that do not lead to genuinely new insight will not be considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信