SynProtX: a large-scale proteomics-based deep learning model for predicting synergistic anticancer drug combinations.

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2025-01-06 DOI:10.1093/gigascience/giaf080

Bundit Boonyarit, Matin Kositchutima, Tisorn Na Phattalung, Nattawin Yamprasert, Chanitra Thuwajit, Thanyada Rungrotmongkol, Sarana Nutanong

{"title":"SynProtX: a large-scale proteomics-based deep learning model for predicting synergistic anticancer drug combinations.","authors":"Bundit Boonyarit, Matin Kositchutima, Tisorn Na Phattalung, Nattawin Yamprasert, Chanitra Thuwajit, Thanyada Rungrotmongkol, Sarana Nutanong","doi":"10.1093/gigascience/giaf080","DOIUrl":null,"url":null,"abstract":"Motivation: Drug combination therapy plays a pivotal role in addressing the molecular heterogeneity of cancer, improving treatment efficacy, minimizing resistance, and reducing toxicity. Deep learning approaches have significantly advanced drug combination discovery by addressing the limitations of conventional laboratory experiments, which are time-consuming and costly. While most existing models rely on the molecular structure of drugs and gene expression data, incorporating protein-level expression provides a more accurate representation of cellular behavior and drug responses. In this study, we introduce SynProtX, an enhanced deep learning model that explicitly integrates large-scale proteomics with deep neural networks (DNNs) and the molecular structure of drugs with graph neural networks (GNNs).Results: The SynProtX-GATFP model, which combines molecular graphs and fingerprints through a graph attention network architecture, demonstrated superior predictive performance for the FRIEDMAN study dataset. We further evaluated its cell line-specific performance, which achieved accuracy across diverse tissue and study datasets. By incorporating protein expression data, the model consistently enhanced predictive performance over gene expression-only models, reflecting the functional state of cancer cells. The generalizability of SynProtX was rigorously validated using cold-start prediction, including leave-drug-combination-out, leave-drug-out, and leave-cell-line-out validation strategies, highlighting its robust performance and potential for clinical applicability. Additionally, SynProtX identified key cancer-associated proteins and molecular substructures, offering novel insights into the biological mechanisms underlying drug synergy. These findings highlight the potential of integrating large-scale proteomics and multiomics data to advance anticancer drug design and combination therapy strategies for personalized medicine. Availability and implementation: https://github.com/manbaritone/SynProtX.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343095/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf080","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Drug combination therapy plays a pivotal role in addressing the molecular heterogeneity of cancer, improving treatment efficacy, minimizing resistance, and reducing toxicity. Deep learning approaches have significantly advanced drug combination discovery by addressing the limitations of conventional laboratory experiments, which are time-consuming and costly. While most existing models rely on the molecular structure of drugs and gene expression data, incorporating protein-level expression provides a more accurate representation of cellular behavior and drug responses. In this study, we introduce SynProtX, an enhanced deep learning model that explicitly integrates large-scale proteomics with deep neural networks (DNNs) and the molecular structure of drugs with graph neural networks (GNNs).

Results: The SynProtX-GATFP model, which combines molecular graphs and fingerprints through a graph attention network architecture, demonstrated superior predictive performance for the FRIEDMAN study dataset. We further evaluated its cell line-specific performance, which achieved accuracy across diverse tissue and study datasets. By incorporating protein expression data, the model consistently enhanced predictive performance over gene expression-only models, reflecting the functional state of cancer cells. The generalizability of SynProtX was rigorously validated using cold-start prediction, including leave-drug-combination-out, leave-drug-out, and leave-cell-line-out validation strategies, highlighting its robust performance and potential for clinical applicability. Additionally, SynProtX identified key cancer-associated proteins and molecular substructures, offering novel insights into the biological mechanisms underlying drug synergy. These findings highlight the potential of integrating large-scale proteomics and multiomics data to advance anticancer drug design and combination therapy strategies for personalized medicine. Availability and implementation: https://github.com/manbaritone/SynProtX.

查看原文本刊更多论文

SynProtX：用于预测协同抗癌药物组合的基于蛋白质组学的大规模深度学习模型。

动机：药物联合治疗在解决肿瘤分子异质性、提高治疗疗效、减少耐药、降低毒性等方面发挥着关键作用。深度学习方法通过解决传统实验室实验耗时且昂贵的局限性，显著推进了药物组合的发现。虽然大多数现有模型依赖于药物的分子结构和基因表达数据，但结合蛋白质水平的表达可以更准确地表示细胞行为和药物反应。在这项研究中，我们介绍了SynProtX，这是一个增强的深度学习模型，它明确地将大规模蛋白质组学与深度神经网络（dnn）和药物分子结构与图神经网络（gnn）相结合。结果：SynProtX-GATFP模型通过图注意网络架构将分子图和指纹结合起来，对FRIEDMAN研究数据集显示出卓越的预测性能。我们进一步评估了其细胞系特异性性能，该性能在不同组织和研究数据集中实现了准确性。通过结合蛋白表达数据，该模型比仅基因表达模型的预测性能持续提高，反映了癌细胞的功能状态。通过冷启动预测，包括遗漏药物组合、遗漏药物和遗漏细胞系验证策略，对SynProtX的通用性进行了严格验证，突出了其稳健的性能和临床应用潜力。此外，SynProtX还鉴定了关键的癌症相关蛋白和分子亚结构，为药物协同作用的生物学机制提供了新的见解。这些发现突出了整合大规模蛋白质组学和多组学数据在推进抗癌药物设计和个性化药物联合治疗策略方面的潜力。可用性和实现：https://github.com/manbaritone/SynProtX。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.