Dat-Thanh Nguyen, Maliha Imami, Lin-Mei Zhao, Jing Wu, Ali Borhani, Alireza Mohseni, Mihir Khunte, Zhusi Zhong, Victoria Shi, Sophie Yao, Yuli Wang, Nicolas Loizou, Alvin C Silva, Paul J Zhang, Zishu Zhang, Zhicheng Jiao, Ihab Kamel, Wei-Hua Liao, Harrison Bai
{"title":"Federated Learning for Renal Tumor Segmentation and Classification on Multi-Center MRI Dataset.","authors":"Dat-Thanh Nguyen, Maliha Imami, Lin-Mei Zhao, Jing Wu, Ali Borhani, Alireza Mohseni, Mihir Khunte, Zhusi Zhong, Victoria Shi, Sophie Yao, Yuli Wang, Nicolas Loizou, Alvin C Silva, Paul J Zhang, Zishu Zhang, Zhicheng Jiao, Ihab Kamel, Wei-Hua Liao, Harrison Bai","doi":"10.1002/jmri.29819","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Deep learning (DL) models for accurate renal tumor characterization may benefit from multi-center datasets for improved generalizability; however, data-sharing constraints necessitate privacy-preserving solutions like federated learning (FL).</p><p><strong>Purpose: </strong>To assess the performance and reliability of FL for renal tumor segmentation and classification in multi-institutional MRI datasets.</p><p><strong>Study type: </strong>Retrospective multi-center study.</p><p><strong>Population: </strong>A total of 987 patients (403 female) from six hospitals were included for analysis. 73% (723/987) had malignant renal tumors, primarily clear cell carcinoma (n = 509). Patients were split into training (n = 785), validation (n = 104), and test (n = 99) sets, stratified across three simulated institutions.</p><p><strong>Field strength/sequence: </strong>MRI was performed at 1.5 T and 3 T using T2-weighted imaging (T2WI) and contrast-enhanced T1-weighted imaging (CE-T1WI) sequences.</p><p><strong>Assessment: </strong>FL and non-FL approaches used nnU-Net for tumor segmentation and ResNet for its classification. FL-trained models across three simulated institutional clients with central weight aggregation, while the non-FL approach used centralized training on the full dataset.</p><p><strong>Statistical tests: </strong>Segmentation was evaluated using Dice coefficients, and classification between malignant and benign lesions was assessed using accuracy, sensitivity, specificity, and area under the curves (AUCs). FL and non-FL performance was compared using the Wilcoxon test for segmentation Dice and Delong's test for AUC (p < 0.05).</p><p><strong>Results: </strong>No significant difference was observed between FL and non-FL models in segmentation (Dice: 0.43 vs. 0.45, p = 0.202) or classification (AUC: 0.69 vs. 0.64, p = 0.959) on the test set. For classification, no significant difference was observed between the models in accuracy (p = 0.912), sensitivity (p = 0.862), or specificity (p = 0.847) on the test set.</p><p><strong>Data conclusion: </strong>FL demonstrated comparable performance to non-FL approaches in renal tumor segmentation and classification, supporting its potential as a privacy-preserving alternative for multi-institutional DL models.</p><p><strong>Evidence level: </strong>4.</p><p><strong>Technical efficacy: </strong>Stage 2.</p>","PeriodicalId":16140,"journal":{"name":"Journal of Magnetic Resonance Imaging","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Magnetic Resonance Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jmri.29819","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Deep learning (DL) models for accurate renal tumor characterization may benefit from multi-center datasets for improved generalizability; however, data-sharing constraints necessitate privacy-preserving solutions like federated learning (FL).
Purpose: To assess the performance and reliability of FL for renal tumor segmentation and classification in multi-institutional MRI datasets.
Study type: Retrospective multi-center study.
Population: A total of 987 patients (403 female) from six hospitals were included for analysis. 73% (723/987) had malignant renal tumors, primarily clear cell carcinoma (n = 509). Patients were split into training (n = 785), validation (n = 104), and test (n = 99) sets, stratified across three simulated institutions.
Field strength/sequence: MRI was performed at 1.5 T and 3 T using T2-weighted imaging (T2WI) and contrast-enhanced T1-weighted imaging (CE-T1WI) sequences.
Assessment: FL and non-FL approaches used nnU-Net for tumor segmentation and ResNet for its classification. FL-trained models across three simulated institutional clients with central weight aggregation, while the non-FL approach used centralized training on the full dataset.
Statistical tests: Segmentation was evaluated using Dice coefficients, and classification between malignant and benign lesions was assessed using accuracy, sensitivity, specificity, and area under the curves (AUCs). FL and non-FL performance was compared using the Wilcoxon test for segmentation Dice and Delong's test for AUC (p < 0.05).
Results: No significant difference was observed between FL and non-FL models in segmentation (Dice: 0.43 vs. 0.45, p = 0.202) or classification (AUC: 0.69 vs. 0.64, p = 0.959) on the test set. For classification, no significant difference was observed between the models in accuracy (p = 0.912), sensitivity (p = 0.862), or specificity (p = 0.847) on the test set.
Data conclusion: FL demonstrated comparable performance to non-FL approaches in renal tumor segmentation and classification, supporting its potential as a privacy-preserving alternative for multi-institutional DL models.
期刊介绍:
The Journal of Magnetic Resonance Imaging (JMRI) is an international journal devoted to the timely publication of basic and clinical research, educational and review articles, and other information related to the diagnostic applications of magnetic resonance.