联合深度学习可以通过蛋白质组学实现癌症亚型。

IF 29.7 1区 医学 Q1 ONCOLOGY
Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel
{"title":"联合深度学习可以通过蛋白质组学实现癌症亚型。","authors":"Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel","doi":"10.1158/2159-8290.CD-24-1488","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.</p>","PeriodicalId":9430,"journal":{"name":"Cancer discovery","volume":" ","pages":""},"PeriodicalIF":29.7000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated deep learning enables cancer subtyping by proteomics.\",\"authors\":\"Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel\",\"doi\":\"10.1158/2159-8290.CD-24-1488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.</p>\",\"PeriodicalId\":9430,\"journal\":{\"name\":\"Cancer discovery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":29.7000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer discovery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1158/2159-8290.CD-24-1488\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer discovery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1158/2159-8290.CD-24-1488","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

人工智能在生物医学领域的应用面临着数据隐私要求的重大挑战。为了解决临床注释组织蛋白质组学数据的这个问题,我们开发了一种联邦深度学习(FDL)方法(ProCanFDL),在包含泛癌症队列(n=1,260)和私有防火墙后的29个队列(n=6,265)数据的模拟站点上训练本地模型,代表19,930个重复数据独立采集质谱(DIA-MS)运行。汇总局部参数更新以构建全局模型,与局部模型相比,在14个癌症亚型任务的保留测试集(n=625)上实现了43%的性能提升,并与集中式模型性能相匹配。通过使用来自两个外部DIA-MS队列(n=55)和8个串联质量标签(TMT)蛋白质组学(n=832)的数据对全局模型进行再训练,证明了该方法的可泛化性。ProCanFDL为使用蛋白质组学数据的国际合作机器学习计划提供了解决方案,例如,用于发现预测性生物标志物或治疗目标,同时保持数据隐私。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Federated deep learning enables cancer subtyping by proteomics.

Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cancer discovery
Cancer discovery ONCOLOGY-
CiteScore
22.90
自引率
1.40%
发文量
838
审稿时长
6-12 weeks
期刊介绍: Cancer Discovery publishes high-impact, peer-reviewed articles detailing significant advances in both research and clinical trials. Serving as a premier cancer information resource, the journal also features Review Articles, Perspectives, Commentaries, News stories, and Research Watch summaries to keep readers abreast of the latest findings in the field. Covering a wide range of topics, from laboratory research to clinical trials and epidemiologic studies, Cancer Discovery spans the entire spectrum of cancer research and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信