Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel
{"title":"Federated deep learning enables cancer subtyping by proteomics.","authors":"Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel","doi":"10.1158/2159-8290.CD-24-1488","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.</p>","PeriodicalId":9430,"journal":{"name":"Cancer discovery","volume":" ","pages":""},"PeriodicalIF":29.7000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer discovery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1158/2159-8290.CD-24-1488","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.
期刊介绍:
Cancer Discovery publishes high-impact, peer-reviewed articles detailing significant advances in both research and clinical trials. Serving as a premier cancer information resource, the journal also features Review Articles, Perspectives, Commentaries, News stories, and Research Watch summaries to keep readers abreast of the latest findings in the field. Covering a wide range of topics, from laboratory research to clinical trials and epidemiologic studies, Cancer Discovery spans the entire spectrum of cancer research and medicine.