Nghi C D Truong,Chandan Ganesh Bangalore Yogananda,Benjamin C Wagner,James M Holcomb,Divya D Reddy,Niloufar Saadat,Jason Bowerman,Kimmo J Hatanpaa,Toral R Patel,Baowei Fei,Matthew D Lee,Rajan Jain,Richard J Bruce,Ananth J Madhuranthakam,Marco C Pinho,Joseph A Maldjian
{"title":"Categorical and phenotypic image synthetic learning as an alternative to federated learning.","authors":"Nghi C D Truong,Chandan Ganesh Bangalore Yogananda,Benjamin C Wagner,James M Holcomb,Divya D Reddy,Niloufar Saadat,Jason Bowerman,Kimmo J Hatanpaa,Toral R Patel,Baowei Fei,Matthew D Lee,Rajan Jain,Richard J Bruce,Ananth J Madhuranthakam,Marco C Pinho,Joseph A Maldjian","doi":"10.1038/s41467-025-64385-z","DOIUrl":null,"url":null,"abstract":"Multi-center collaborations are crucial in developing robust and generalizable machine learning models in medical imaging. Traditional methods, such as centralized data sharing or federated learning (FL), face challenges, including privacy issues, communication burdens, and synchronization complexities. We present CATegorical and PHenotypic Image SyntHetic learnING (CATphishing), an alternative to FL using Latent Diffusion Models (LDM) to generate synthetic multi-contrast three-dimensional magnetic resonance imaging data for downstream tasks, eliminating the need for raw data sharing or iterative inter-site communication. Each institution trains an LDM to capture site-specific data distributions, producing synthetic samples aggregated at a central server. We evaluate CATphishing using data from 2491 patients across seven institutions for isocitrate dehydrogenase mutation classification and three-class tumor-type classification. CATphishing achieves accuracy comparable to centralized training and FL, with synthetic data exhibiting high fidelity. This method addresses privacy, scalability, and communication challenges, offering a promising alternative for collaborative artificial intelligence development in medical imaging.","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"105 1","pages":"9384"},"PeriodicalIF":15.7000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-64385-z","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-center collaborations are crucial in developing robust and generalizable machine learning models in medical imaging. Traditional methods, such as centralized data sharing or federated learning (FL), face challenges, including privacy issues, communication burdens, and synchronization complexities. We present CATegorical and PHenotypic Image SyntHetic learnING (CATphishing), an alternative to FL using Latent Diffusion Models (LDM) to generate synthetic multi-contrast three-dimensional magnetic resonance imaging data for downstream tasks, eliminating the need for raw data sharing or iterative inter-site communication. Each institution trains an LDM to capture site-specific data distributions, producing synthetic samples aggregated at a central server. We evaluate CATphishing using data from 2491 patients across seven institutions for isocitrate dehydrogenase mutation classification and three-class tumor-type classification. CATphishing achieves accuracy comparable to centralized training and FL, with synthetic data exhibiting high fidelity. This method addresses privacy, scalability, and communication challenges, offering a promising alternative for collaborative artificial intelligence development in medical imaging.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.