A robust sampling technique for realistic distribution simulation in federated learning.

IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL
Robin Hoepp, Leonhard Rist, Alexander Katzmann, Raghavan Ashok, Andreas Wimmer, Michael Sühling, Andreas Maier
{"title":"A robust sampling technique for realistic distribution simulation in federated learning.","authors":"Robin Hoepp, Leonhard Rist, Alexander Katzmann, Raghavan Ashok, Andreas Wimmer, Michael Sühling, Andreas Maier","doi":"10.1007/s11548-025-03504-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Federated Learning helps training deep learning networks with diverse data from different locations, particularly in restricted clinical settings. However, label distributions overlapping only partially across clients, due to different demographics, may significantly harm the global training, and thus local model performance. Investigating such effects before rolling out large-scale Federated Learning setups requires proper sampling of the expected label distributions.</p><p><strong>Methods: </strong>We present a sampling algorithm to build data subsets according to desired mean and standard deviations from an initial global distribution. To this end, we incorporate the chi-squared and Gini impurity measures to numerically optimize label distributions for multiple groups in an efficient fashion.</p><p><strong>Results: </strong>Using a real-world application scenario, we sample train and test groups according to region-specific distributions for 3D camera-based weight and height estimation in a clinical context, comparing a hard data split serving as a baseline with our proposed sampling technique. We train a baseline model on all data for comparison and use Federated Averaging to combine the training of our data subsets, demonstrating a realistic deterioration of 25.3 % on weight and 28.7 % on height estimations by the global model.</p><p><strong>Conclusions: </strong>Realistically client-biased label distribution can notably harm the training in a federated context. Our sampling algorithm for simulating realistic data distributions opens up an efficient way for prior analysis of this effect. The technique is agnostic to the chosen network architecture and target scenario and can be adapted to any feature or label problem with non-IID subpopulations.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03504-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Federated Learning helps training deep learning networks with diverse data from different locations, particularly in restricted clinical settings. However, label distributions overlapping only partially across clients, due to different demographics, may significantly harm the global training, and thus local model performance. Investigating such effects before rolling out large-scale Federated Learning setups requires proper sampling of the expected label distributions.

Methods: We present a sampling algorithm to build data subsets according to desired mean and standard deviations from an initial global distribution. To this end, we incorporate the chi-squared and Gini impurity measures to numerically optimize label distributions for multiple groups in an efficient fashion.

Results: Using a real-world application scenario, we sample train and test groups according to region-specific distributions for 3D camera-based weight and height estimation in a clinical context, comparing a hard data split serving as a baseline with our proposed sampling technique. We train a baseline model on all data for comparison and use Federated Averaging to combine the training of our data subsets, demonstrating a realistic deterioration of 25.3 % on weight and 28.7 % on height estimations by the global model.

Conclusions: Realistically client-biased label distribution can notably harm the training in a federated context. Our sampling algorithm for simulating realistic data distributions opens up an efficient way for prior analysis of this effect. The technique is agnostic to the chosen network architecture and target scenario and can be adapted to any feature or label problem with non-IID subpopulations.

一种用于联邦学习中真实分布模拟的鲁棒抽样技术。
目的:联邦学习有助于训练来自不同地点的不同数据的深度学习网络,特别是在有限的临床环境中。然而,由于不同的人口统计数据,标签分布只在客户端部分重叠,可能会严重损害全局训练,从而影响局部模型的性能。在推出大规模的联邦学习设置之前,调查这种影响需要对预期的标签分布进行适当的采样。方法:我们提出了一种抽样算法,根据初始全局分布的期望均值和标准差建立数据子集。为此,我们结合卡方和基尼杂质措施,以有效的方式对多个群体的标签分布进行数值优化。结果:在现实世界的应用场景中,我们根据临床环境中基于3D相机的体重和身高估计的区域特定分布对训练和测试组进行采样,并将硬数据分割作为基线与我们提出的采样技术进行比较。我们在所有数据上训练了一个基线模型进行比较,并使用Federated Averaging将我们的数据子集的训练结合起来,证明了全球模型估计的体重和高度的实际退化程度分别为25.3%和28.7%。结论:实际上,客户偏见的标签分布会明显损害联邦环境中的训练。我们的模拟真实数据分布的抽样算法为这种效应的先验分析开辟了一种有效的方法。该技术与所选择的网络体系结构和目标场景无关,可以适用于具有非iid子种群的任何特征或标签问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Computer Assisted Radiology and Surgery
International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
CiteScore
5.90
自引率
6.70%
发文量
243
审稿时长
6-12 weeks
期刊介绍: The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信