Federated learning for enhanced dose–volume parameter prediction with decentralized data

IF 3.2 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Medical physics Pub Date : 2024-12-06 DOI:10.1002/mp.17566
Jiahan Zhang, Yang Lei, Junyi Xia, Ming Chao, Tian Liu
{"title":"Federated learning for enhanced dose–volume parameter prediction with decentralized data","authors":"Jiahan Zhang,&nbsp;Yang Lei,&nbsp;Junyi Xia,&nbsp;Ming Chao,&nbsp;Tian Liu","doi":"10.1002/mp.17566","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The widespread adoption of knowledge-based planning in radiation oncology clinics is hindered by the lack of data and the difficulty associated with sharing medical data.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study aims to assess the feasibility of mitigating this challenge through federated learning (FL): a centralized model trained with distributed datasets, while keeping data localized and private.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This concept was tested using 273 prostate 45 Gy plans. The cases were split into a training set with 220 cases and a validation set with 53 cases. The training set was further separated into 10 subsets to simulate treatment plans from different clinics. A gradient-boosting model was used to predict bladder and rectum V<sub>30Gy</sub>, V<sub>35Gy</sub>, and V<sub>40Gy</sub>. The Federated Averaging algorithm was employed to aggregate the individual model weights from distributed datasets. Grid search with five-fold in-training-set cross-validation was implemented to tune model hyperparameters. Additionally, we evaluated the robustness of the FL approach by varying the distribution of the training set data in several scenarios, including different number of sites and imbalanced data across sites.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The mean absolute error (MAE) for the FL model (4.7% ± 2.9%) is significantly lower than individual models trained separately (6.5% ± 4.9%, <i>p</i> &lt; 0.001) and similar to a traditional centralized model (4.4% ± 2.8%, <i>p</i> = 0.14). The federated model is robust to the number of subsets, showing MAE of 4.7% ± 3.2%, 4.8% ± 3.1%, 4.8% ± 2.9%, 4.5% ± 2.8%, 4.9% ± 3.3%, and 4.8% ± 3.1% for 5, 10, 15, 20, 25, and 30 subsets, respectively. For the two imbalanced datasets, the FL model achieves MAEs of 4.5% ± 2.9% and 5.6% ± 4.0%, non-inferior to the balanced data model. For all bladder and rectum metrics, the FL model significantly outperforms 36.7% of individual models.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>This study demonstrates the potential advantages of implementing a federated model over training individual models: the proposed FL approach achieves similar prediction accuracy as a conventional model without requiring centralized data storage. Even when local models struggle to produce accurate predictions due to data scarcity, the federated model consistently maintains high performance.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 3","pages":"1408-1415"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17566","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

The widespread adoption of knowledge-based planning in radiation oncology clinics is hindered by the lack of data and the difficulty associated with sharing medical data.

Purpose

This study aims to assess the feasibility of mitigating this challenge through federated learning (FL): a centralized model trained with distributed datasets, while keeping data localized and private.

Methods

This concept was tested using 273 prostate 45 Gy plans. The cases were split into a training set with 220 cases and a validation set with 53 cases. The training set was further separated into 10 subsets to simulate treatment plans from different clinics. A gradient-boosting model was used to predict bladder and rectum V30Gy, V35Gy, and V40Gy. The Federated Averaging algorithm was employed to aggregate the individual model weights from distributed datasets. Grid search with five-fold in-training-set cross-validation was implemented to tune model hyperparameters. Additionally, we evaluated the robustness of the FL approach by varying the distribution of the training set data in several scenarios, including different number of sites and imbalanced data across sites.

Results

The mean absolute error (MAE) for the FL model (4.7% ± 2.9%) is significantly lower than individual models trained separately (6.5% ± 4.9%, p < 0.001) and similar to a traditional centralized model (4.4% ± 2.8%, p = 0.14). The federated model is robust to the number of subsets, showing MAE of 4.7% ± 3.2%, 4.8% ± 3.1%, 4.8% ± 2.9%, 4.5% ± 2.8%, 4.9% ± 3.3%, and 4.8% ± 3.1% for 5, 10, 15, 20, 25, and 30 subsets, respectively. For the two imbalanced datasets, the FL model achieves MAEs of 4.5% ± 2.9% and 5.6% ± 4.0%, non-inferior to the balanced data model. For all bladder and rectum metrics, the FL model significantly outperforms 36.7% of individual models.

Conclusions

This study demonstrates the potential advantages of implementing a federated model over training individual models: the proposed FL approach achieves similar prediction accuracy as a conventional model without requiring centralized data storage. Even when local models struggle to produce accurate predictions due to data scarcity, the federated model consistently maintains high performance.

用分散数据增强剂量-体积参数预测的联邦学习。
背景:由于缺乏数据和难以共享医疗数据,在放射肿瘤学诊所广泛采用基于知识的规划受到阻碍。目的:本研究旨在评估通过联邦学习(FL)缓解这一挑战的可行性:一个用分布式数据集训练的集中式模型,同时保持数据的本地化和私密性。方法:用273张前列腺45gy平面图对这一概念进行检验。这些案例被分成一个包含220个案例的训练集和一个包含53个案例的验证集。将训练集进一步分成10个子集,模拟不同诊所的治疗方案。采用梯度增强模型预测膀胱和直肠V30Gy、V35Gy和V40Gy。采用联邦平均算法对分布式数据集的各模型权重进行聚合。采用网格搜索和五次训练集交叉验证对模型超参数进行调优。此外,我们通过改变训练集数据在不同场景下的分布来评估FL方法的鲁棒性,包括不同数量的站点和跨站点的不平衡数据。结果:FL模型的平均绝对误差(MAE)(4.7%±2.9%)显著低于单独训练的单个模型(6.5%±4.9%),p结论:本研究证明了实施联合模型相对于训练单个模型的潜在优势:所提出的FL方法在不需要集中数据存储的情况下实现了与传统模型相似的预测精度。即使当本地模型由于数据稀缺而难以产生准确的预测时,联邦模型也始终保持高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical physics
Medical physics 医学-核医学
CiteScore
6.80
自引率
15.80%
发文量
660
审稿时长
1.7 months
期刊介绍: Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信