{"title":"SynProtX: a large-scale proteomics-based deep learning model for predicting synergistic anticancer drug combinations.","authors":"Bundit Boonyarit, Matin Kositchutima, Tisorn Na Phattalung, Nattawin Yamprasert, Chanitra Thuwajit, Thanyada Rungrotmongkol, Sarana Nutanong","doi":"10.1093/gigascience/giaf080","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Drug combination therapy plays a pivotal role in addressing the molecular heterogeneity of cancer, improving treatment efficacy, minimizing resistance, and reducing toxicity. Deep learning approaches have significantly advanced drug combination discovery by addressing the limitations of conventional laboratory experiments, which are time-consuming and costly. While most existing models rely on the molecular structure of drugs and gene expression data, incorporating protein-level expression provides a more accurate representation of cellular behavior and drug responses. In this study, we introduce SynProtX, an enhanced deep learning model that explicitly integrates large-scale proteomics with deep neural networks (DNNs) and the molecular structure of drugs with graph neural networks (GNNs).</p><p><strong>Results: </strong>The SynProtX-GATFP model, which combines molecular graphs and fingerprints through a graph attention network architecture, demonstrated superior predictive performance for the FRIEDMAN study dataset. We further evaluated its cell line-specific performance, which achieved accuracy across diverse tissue and study datasets. By incorporating protein expression data, the model consistently enhanced predictive performance over gene expression-only models, reflecting the functional state of cancer cells. The generalizability of SynProtX was rigorously validated using cold-start prediction, including leave-drug-combination-out, leave-drug-out, and leave-cell-line-out validation strategies, highlighting its robust performance and potential for clinical applicability. Additionally, SynProtX identified key cancer-associated proteins and molecular substructures, offering novel insights into the biological mechanisms underlying drug synergy. These findings highlight the potential of integrating large-scale proteomics and multiomics data to advance anticancer drug design and combination therapy strategies for personalized medicine. Availability and implementation: https://github.com/manbaritone/SynProtX.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343095/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf080","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Drug combination therapy plays a pivotal role in addressing the molecular heterogeneity of cancer, improving treatment efficacy, minimizing resistance, and reducing toxicity. Deep learning approaches have significantly advanced drug combination discovery by addressing the limitations of conventional laboratory experiments, which are time-consuming and costly. While most existing models rely on the molecular structure of drugs and gene expression data, incorporating protein-level expression provides a more accurate representation of cellular behavior and drug responses. In this study, we introduce SynProtX, an enhanced deep learning model that explicitly integrates large-scale proteomics with deep neural networks (DNNs) and the molecular structure of drugs with graph neural networks (GNNs).
Results: The SynProtX-GATFP model, which combines molecular graphs and fingerprints through a graph attention network architecture, demonstrated superior predictive performance for the FRIEDMAN study dataset. We further evaluated its cell line-specific performance, which achieved accuracy across diverse tissue and study datasets. By incorporating protein expression data, the model consistently enhanced predictive performance over gene expression-only models, reflecting the functional state of cancer cells. The generalizability of SynProtX was rigorously validated using cold-start prediction, including leave-drug-combination-out, leave-drug-out, and leave-cell-line-out validation strategies, highlighting its robust performance and potential for clinical applicability. Additionally, SynProtX identified key cancer-associated proteins and molecular substructures, offering novel insights into the biological mechanisms underlying drug synergy. These findings highlight the potential of integrating large-scale proteomics and multiomics data to advance anticancer drug design and combination therapy strategies for personalized medicine. Availability and implementation: https://github.com/manbaritone/SynProtX.
期刊介绍:
GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.