探索基于图形的模型预测抗三阴性乳腺癌的活性化合物。

IF 3.8 2区化学 Q2 CHEMISTRY, APPLIED

Molecular Diversity Pub Date : 2025-07-09 DOI:10.1007/s11030-025-11283-7

Hridoy Jyoti Mahanta, Amarjeet Boruah, Bikram Phukan, Hillul Chutia, Pankaj Bharali, Selvaraman Nagamani

{"title":"探索基于图形的模型预测抗三阴性乳腺癌的活性化合物。","authors":"Hridoy Jyoti Mahanta, Amarjeet Boruah, Bikram Phukan, Hillul Chutia, Pankaj Bharali, Selvaraman Nagamani","doi":"10.1007/s11030-025-11283-7","DOIUrl":null,"url":null,"abstract":"Breast cancer is among the most dominant and rapidly rising cancers, both in India and around the world. Triple-negative breast cancer (TNBC) is one of the most aggressive subtypes of breast cancer, distinguished by the absence of HER2, progesterone, and estrogen receptor expressions. This absence limits treatment options, emphasizing the urgent need to discover or design new drug candidates for TNBC. Integrating artificial intelligence and machine learning in computational modeling, has significantly accelerated the analysis of large-scale biological data and improved the prediction of therapeutic outcomes. In this study, we curated a data set of 756 mutant-type compounds from three cell lines and developed four graph-based models to predict active compounds against TNBC. Validated using stratified nested tenfold cross-validation and optimized with the Optuna framework, the models achieved predictive accuracy with AUC values of 0.65-0.82, with the MPNN model outperforming all the others. Furthermore, key structural fragments associated with cell inhibition and model predictions were identified and interpreted using several explainability techniques. Validation with an external set of FDA-approved drugs demonstrated prediction accuracies ranging from 66% to 97%, highlighting the robustness of the models in identifying compounds with potential inhibitory activity against TNBC cells.","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring graph-based models for predicting active compounds against triple-negative breast cancer.\",\"authors\":\"Hridoy Jyoti Mahanta, Amarjeet Boruah, Bikram Phukan, Hillul Chutia, Pankaj Bharali, Selvaraman Nagamani\",\"doi\":\"10.1007/s11030-025-11283-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is among the most dominant and rapidly rising cancers, both in India and around the world. Triple-negative breast cancer (TNBC) is one of the most aggressive subtypes of breast cancer, distinguished by the absence of HER2, progesterone, and estrogen receptor expressions. This absence limits treatment options, emphasizing the urgent need to discover or design new drug candidates for TNBC. Integrating artificial intelligence and machine learning in computational modeling, has significantly accelerated the analysis of large-scale biological data and improved the prediction of therapeutic outcomes. In this study, we curated a data set of 756 mutant-type compounds from three cell lines and developed four graph-based models to predict active compounds against TNBC. Validated using stratified nested tenfold cross-validation and optimized with the Optuna framework, the models achieved predictive accuracy with AUC values of 0.65-0.82, with the MPNN model outperforming all the others. Furthermore, key structural fragments associated with cell inhibition and model predictions were identified and interpreted using several explainability techniques. Validation with an external set of FDA-approved drugs demonstrated prediction accuracies ranging from 66% to 97%, highlighting the robustness of the models in identifying compounds with potential inhibitory activity against TNBC cells.\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11283-7\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11283-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

无论是在印度还是在世界各地，乳腺癌都是最主要、发病率上升最快的癌症之一。三阴性乳腺癌（TNBC）是最具侵袭性的乳腺癌亚型之一，其特点是HER2、孕激素和雌激素受体表达缺失。这种缺失限制了治疗选择，强调迫切需要发现或设计新的TNBC候选药物。将人工智能和机器学习集成到计算建模中，大大加快了对大规模生物数据的分析，提高了对治疗结果的预测。在这项研究中，我们收集了来自三种细胞系的756种突变型化合物的数据集，并开发了四种基于图的模型来预测抗TNBC的活性化合物。使用分层嵌套十倍交叉验证验证并使用Optuna框架进行优化，模型的AUC值达到0.65-0.82，MPNN模型优于所有其他模型。此外，与细胞抑制和模型预测相关的关键结构片段被确定并使用几种可解释性技术进行解释。fda批准的一组外部药物验证表明，预测准确度在66%至97%之间，突出了模型在识别对TNBC细胞具有潜在抑制活性的化合物方面的稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring graph-based models for predicting active compounds against triple-negative breast cancer.

Breast cancer is among the most dominant and rapidly rising cancers, both in India and around the world. Triple-negative breast cancer (TNBC) is one of the most aggressive subtypes of breast cancer, distinguished by the absence of HER2, progesterone, and estrogen receptor expressions. This absence limits treatment options, emphasizing the urgent need to discover or design new drug candidates for TNBC. Integrating artificial intelligence and machine learning in computational modeling, has significantly accelerated the analysis of large-scale biological data and improved the prediction of therapeutic outcomes. In this study, we curated a data set of 756 mutant-type compounds from three cell lines and developed four graph-based models to predict active compounds against TNBC. Validated using stratified nested tenfold cross-validation and optimized with the Optuna framework, the models achieved predictive accuracy with AUC values of 0.65-0.82, with the MPNN model outperforming all the others. Furthermore, key structural fragments associated with cell inhibition and model predictions were identified and interpreted using several explainability techniques. Validation with an external set of FDA-approved drugs demonstrated prediction accuracies ranging from 66% to 97%, highlighting the robustness of the models in identifying compounds with potential inhibitory activity against TNBC cells.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Diversity 化学-化学综合

CiteScore

7.30

自引率

7.90%

发文量

219

审稿时长

2.7 months

期刊介绍： Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;