可解释的人工智能驱动的 APE1 抑制剂预测：利用机器学习模型和特征重要性分析加强癌症治疗。

IF 3.8 2区化学 Q2 CHEMISTRY, APPLIED

Molecular Diversity Pub Date : 2025-08-01 Epub Date: 2025-02-21 DOI:10.1007/s11030-025-11133-6

Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah

{"title":"可解释的人工智能驱动的 APE1 抑制剂预测：利用机器学习模型和特征重要性分析加强癌症治疗。","authors":"Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah","doi":"10.1007/s11030-025-11133-6","DOIUrl":null,"url":null,"abstract":"The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R2 = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R2 scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":"3371-3390"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainable AI-driven prediction of APE1 inhibitors: enhancing cancer therapy with machine learning models and feature importance analysis.\",\"authors\":\"Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah\",\"doi\":\"10.1007/s11030-025-11133-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R2 = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R2 scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"3371-3390\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11133-6\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11133-6","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/21 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

细胞的活力和基因组的完整性依赖于通过复杂的机制检测和修复受损的DNA。癌症治疗采用化学疗法或放射疗法，通过对肿瘤细胞的DNA造成实质性损害来消除肿瘤细胞。在许多情况下，DNA修复机制的改善导致对这些药物产生耐药性；因此，有必要加大努力，开发能够通过抑制DNA修复过程使细胞对这些治疗敏感的药物。多项研究表明，无嘌呤/无嘧啶核酸内切酶（APE1）的过表达与细胞对癌症治疗的抗性之间存在相关性，APE1是负责切除DNA中无嘌呤或无嘧啶位点的主要哺乳动物酶；相反，APE1下调会增加细胞对dna损伤剂的易感性。因此，现有疗法的有效性可以通过促进癌细胞的靶向致敏，同时保护健康细胞来提高。目前的研究旨在利用可解释的人工智能（XAI）来提高机器学习模型预测APE1抑制剂的准确性和可靠性。采用不同的基于ml的回归模型预测不同药物的pIC50值。贝叶斯优化和排列特征重要性（PFI）方法分别用于确定机器学习模型的最佳超参数和发现识别靶向APE1酶的候选药物的最重要特征。为了对我们研究中的预测模型进行全面的说明，我们使用了两种XAI方法，即SHAP和LIME。SHAP分析表明，特征“C1SP2”和“ASP-2”对影响模型的预测至关重要。SHAP值显示了“maxHBint2”和“gats1”等特征的可变性，表明它们的影响取决于数据集中的特定实例。LIME的研究证实了这些发现，表明“C1SP2”和“ASP-2”是最显著的积极贡献者，而“SHCHnX”、“nHdCH2”和“GATS1s”等特征导致预测值下降。由于APE1数据集的样本量有限，在该数据集上直接训练对模型泛化和可靠性提出了挑战。为了克服这一限制，BACE-1数据集被用于模型训练，使ML模型能够从更广泛和多样化的化学空间中学习。在所测试的算法中，XGBoost表现出较好的预测性能，达到R2 = 0.890, MAE = 0.186, RMSE = 0.245，显著超过目前最先进的方法，如LightGBM和QSAR-ML，其R2得分分别为0.798和0.630。这些结果突出了我们的方法的鲁棒性，证明了与现有方法相比，它具有增强的泛化能力和优越的预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Explainable AI-driven prediction of APE1 inhibitors: enhancing cancer therapy with machine learning models and feature importance analysis.

The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R² = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R² scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Diversity 化学-化学综合

CiteScore

7.30

自引率

7.90%

发文量

219

审稿时长

2.7 months

期刊介绍： Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;