Rahul D Jawarkar, Suraj Mali, Prashant K Deshmukh, Rahul G Ingle, Sami A Al-Hussain, Aamal A Al-Mutairi, Magdi E A Zaki
{"title":"Synergizing GA-XGBoost and QSAR modeling: Breaking down activity aliffs in HDAC1 inhibitors.","authors":"Rahul D Jawarkar, Suraj Mali, Prashant K Deshmukh, Rahul G Ingle, Sami A Al-Hussain, Aamal A Al-Mutairi, Magdi E A Zaki","doi":"10.1016/j.jmgm.2024.108915","DOIUrl":null,"url":null,"abstract":"<p><p>The work being presented now combines severe gradient boosting with Shapley values, a thriving merger within the field of explainable artificial intelligence. We also use a genetic algorithm to analyse the HDAC1 inhibitory activity of a broad pool of 1274 molecules experimentally reported for HDAC1 inhibition. We conduct this analysis to ascertain the HDAC1 inhibitory activity of these molecules. Based on a rigorous investigation of extreme gradient boosting, the proposed method suggests using a genetic algorithm to identify pharmacophoric features. The statistical acceptability of extreme gradient boosting analysis is robust, with parameters such as R<sup>2</sup><sub>tr</sub> = 0.8797, R<sup>2</sup><sub>L10 %</sub> = 0.8831, Q<sup>2</sup><sub>F1</sub> = 0.9459, Q<sup>2</sup><sub>F2</sub> = 0.9452, and Q<sup>2</sup><sub>F3</sub> = 0.9474. This is the driving force behind the invention of nine Py-descriptor-containing genetic algorithms. Shapley additive explanations formed the basis for the interpretation, assigning a significant value to each variable in the model. This is followed by the use of counterfactual cases to analyse the impact of the discovered molecular descriptors on HDAC1 inhibition. An examination of the molecular descriptors, which include acc_N_3B, fsp2NringC8B, fsp3NC7B, and sp2N_sp3C_3B, demonstrates that these descriptors provide insight into the function that the nitrogen atom plays in influencing HDAC1's inhibitory activity. Furthermore, the investigation shed light on the significant role that the hybridized carbon atoms located in sp2 and sp3 play in HDAC1 inhibition. Thus, the QSAR results are in conformity with the reported findings. In addition, activity cliff analysis supports the QSAR findings. Thus, the genetic algorithm-extreme gradient-boosting GA-XGBoost model is easy to understand and makes decent predictions. Based on this, it indicates that \"explainable AI\" may prove to be beneficial in the future for the purpose of identifying and using structural features in the process of medication development.</p>","PeriodicalId":16361,"journal":{"name":"Journal of molecular graphics & modelling","volume":"135 ","pages":"108915"},"PeriodicalIF":2.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of molecular graphics & modelling","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.jmgm.2024.108915","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The work being presented now combines severe gradient boosting with Shapley values, a thriving merger within the field of explainable artificial intelligence. We also use a genetic algorithm to analyse the HDAC1 inhibitory activity of a broad pool of 1274 molecules experimentally reported for HDAC1 inhibition. We conduct this analysis to ascertain the HDAC1 inhibitory activity of these molecules. Based on a rigorous investigation of extreme gradient boosting, the proposed method suggests using a genetic algorithm to identify pharmacophoric features. The statistical acceptability of extreme gradient boosting analysis is robust, with parameters such as R2tr = 0.8797, R2L10 % = 0.8831, Q2F1 = 0.9459, Q2F2 = 0.9452, and Q2F3 = 0.9474. This is the driving force behind the invention of nine Py-descriptor-containing genetic algorithms. Shapley additive explanations formed the basis for the interpretation, assigning a significant value to each variable in the model. This is followed by the use of counterfactual cases to analyse the impact of the discovered molecular descriptors on HDAC1 inhibition. An examination of the molecular descriptors, which include acc_N_3B, fsp2NringC8B, fsp3NC7B, and sp2N_sp3C_3B, demonstrates that these descriptors provide insight into the function that the nitrogen atom plays in influencing HDAC1's inhibitory activity. Furthermore, the investigation shed light on the significant role that the hybridized carbon atoms located in sp2 and sp3 play in HDAC1 inhibition. Thus, the QSAR results are in conformity with the reported findings. In addition, activity cliff analysis supports the QSAR findings. Thus, the genetic algorithm-extreme gradient-boosting GA-XGBoost model is easy to understand and makes decent predictions. Based on this, it indicates that "explainable AI" may prove to be beneficial in the future for the purpose of identifying and using structural features in the process of medication development.
期刊介绍:
The Journal of Molecular Graphics and Modelling is devoted to the publication of papers on the uses of computers in theoretical investigations of molecular structure, function, interaction, and design. The scope of the journal includes all aspects of molecular modeling and computational chemistry, including, for instance, the study of molecular shape and properties, molecular simulations, protein and polymer engineering, drug design, materials design, structure-activity and structure-property relationships, database mining, and compound library design.
As a primary research journal, JMGM seeks to bring new knowledge to the attention of our readers. As such, submissions to the journal need to not only report results, but must draw conclusions and explore implications of the work presented. Authors are strongly encouraged to bear this in mind when preparing manuscripts. Routine applications of standard modelling approaches, providing only very limited new scientific insight, will not meet our criteria for publication. Reproducibility of reported calculations is an important issue. Wherever possible, we urge authors to enhance their papers with Supplementary Data, for example, in QSAR studies machine-readable versions of molecular datasets or in the development of new force-field parameters versions of the topology and force field parameter files. Routine applications of existing methods that do not lead to genuinely new insight will not be considered.