Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel
{"title":"联合深度学习可以通过蛋白质组学实现癌症亚型。","authors":"Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel","doi":"10.1158/2159-8290.CD-24-1488","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.</p>","PeriodicalId":9430,"journal":{"name":"Cancer discovery","volume":" ","pages":""},"PeriodicalIF":29.7000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated deep learning enables cancer subtyping by proteomics.\",\"authors\":\"Zhaoxiang Cai, Emma L Boys, Zainab Noor, Adel T Aref, Dylan Xavier, Natasha Lucas, Steven G Williams, Jennifer Ms Koh, Rebecca C Poulos, Yangxiu Wu, Michael Dausmann, Karen L MacKenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M Barranco, Mark Basik, Elise D Bowman, Roderick Clifton-Bligh, Elizabeth A Connolly, Wendy A Cooper, Bhavik Dalal, Anna DeFazio, Martin Filipits, Peter J Flynn, J Dinny Graham, Jacob George, Anthony J Gill, Michael Gnant, Rosemary Habib, Curtis C Harris, Kate Harvey, Lisa G Horvath, Christopher Jackson, Maija R J Kohonen-Corish, Elgene Lim, Jia Jenny Liu, Georgina V Long, Reginald V Lord, Graham J Mann, Geoffrey W McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J Panizza, Jaswinder S Samra, Richard A Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L Balleine, Peter G Hains, Phillip J Robinson, Qing Zhong, Roger R Reddel\",\"doi\":\"10.1158/2159-8290.CD-24-1488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.</p>\",\"PeriodicalId\":9430,\"journal\":{\"name\":\"Cancer discovery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":29.7000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer discovery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1158/2159-8290.CD-24-1488\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer discovery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1158/2159-8290.CD-24-1488","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
Federated deep learning enables cancer subtyping by proteomics.
Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.
期刊介绍:
Cancer Discovery publishes high-impact, peer-reviewed articles detailing significant advances in both research and clinical trials. Serving as a premier cancer information resource, the journal also features Review Articles, Perspectives, Commentaries, News stories, and Research Watch summaries to keep readers abreast of the latest findings in the field. Covering a wide range of topics, from laboratory research to clinical trials and epidemiologic studies, Cancer Discovery spans the entire spectrum of cancer research and medicine.