Dyuti Banerjee, Sivaneasan Bala Krishnan, Kamal Upreti, Sumegh Shrikant Tharewal, Uma Shankar, Pravin Kshirsagar, Manoj Kumar
{"title":"Data-Driven Drug Discovery Optimization for Breast Cancer Using Interpretable Machine Learning Models.","authors":"Dyuti Banerjee, Sivaneasan Bala Krishnan, Kamal Upreti, Sumegh Shrikant Tharewal, Uma Shankar, Pravin Kshirsagar, Manoj Kumar","doi":"10.3791/68705","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer remains one of the most prevalent malignancies worldwide, posing significant therapeutic challenges due to tumor heterogeneity and drug resistance. This study presents a reproducible, data-driven machine learning protocol for predicting drug sensitivity in breast cancer cell lines, with the dual objective of identifying potent single agents and synergistic drug combinations. Using curated datasets from the Genomics of Drug Sensitivity in Cancer (GDSC), two predictive approaches were implemented: a standalone XGBoost regressor and a hybrid Autoencoder-XGBoost pipeline. Preprocessing included label encoding, one-hot encoding, Z-score standardization, missing value imputation, and dimensionality reduction via PCA. Model evaluation demonstrated that XGBoost achieved superior performance (MSE = 1.3789, R<sup>2</sup> = 0.8145) compared to the hybrid model (MSE = 4.0322, R<sup>2</sup> = 0.4577). Interpretability was addressed using SHapley Additive exPlanations (SHAP), which identified TARGET_PATHWAY, DRUG_ID, TARGET, and CELL_LINE_NAME as key predictive features, aligning with established pharmacological mechanisms. Predicted synergy scores, derived from combining model outputs with DrugComb and SynergyDB data, highlighted promising drug pairs such as Bortezomib + Romidepsin and Paclitaxel + Bortezomib. These findings were further supported by PCA-based pharmacological clustering, revealing biologically relevant groupings of drugs with similar mechanisms of action. The proposed protocol provides a transparent and adaptable framework for precision oncology research, enabling both predictive accuracy and biological interpretability. By integrating rigorous preprocessing, model validation, explainability, and drug synergy analysis, this workflow offers a scalable foundation for translational drug discovery and repurposing in breast cancer treatment.</p>","PeriodicalId":48787,"journal":{"name":"Jove-Journal of Visualized Experiments","volume":" 223","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jove-Journal of Visualized Experiments","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3791/68705","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer remains one of the most prevalent malignancies worldwide, posing significant therapeutic challenges due to tumor heterogeneity and drug resistance. This study presents a reproducible, data-driven machine learning protocol for predicting drug sensitivity in breast cancer cell lines, with the dual objective of identifying potent single agents and synergistic drug combinations. Using curated datasets from the Genomics of Drug Sensitivity in Cancer (GDSC), two predictive approaches were implemented: a standalone XGBoost regressor and a hybrid Autoencoder-XGBoost pipeline. Preprocessing included label encoding, one-hot encoding, Z-score standardization, missing value imputation, and dimensionality reduction via PCA. Model evaluation demonstrated that XGBoost achieved superior performance (MSE = 1.3789, R2 = 0.8145) compared to the hybrid model (MSE = 4.0322, R2 = 0.4577). Interpretability was addressed using SHapley Additive exPlanations (SHAP), which identified TARGET_PATHWAY, DRUG_ID, TARGET, and CELL_LINE_NAME as key predictive features, aligning with established pharmacological mechanisms. Predicted synergy scores, derived from combining model outputs with DrugComb and SynergyDB data, highlighted promising drug pairs such as Bortezomib + Romidepsin and Paclitaxel + Bortezomib. These findings were further supported by PCA-based pharmacological clustering, revealing biologically relevant groupings of drugs with similar mechanisms of action. The proposed protocol provides a transparent and adaptable framework for precision oncology research, enabling both predictive accuracy and biological interpretability. By integrating rigorous preprocessing, model validation, explainability, and drug synergy analysis, this workflow offers a scalable foundation for translational drug discovery and repurposing in breast cancer treatment.
期刊介绍:
JoVE, the Journal of Visualized Experiments, is the world''s first peer reviewed scientific video journal. Established in 2006, JoVE is devoted to publishing scientific research in a visual format to help researchers overcome two of the biggest challenges facing the scientific research community today; poor reproducibility and the time and labor intensive nature of learning new experimental techniques.