医学预测模型的实用联合学习与评价

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics Pub Date : 2025-07-18 DOI:10.1016/j.ijmedinf.2025.106046

Andrei Kazlouski , Ileana Montoya Perez , Faiza Noor , Mikael Högerman , Otto Ettala , Tapio Pahikkala , Antti Airola

{"title":"医学预测模型的实用联合学习与评价","authors":"Andrei Kazlouski , Ileana Montoya Perez , Faiza Noor , Mikael Högerman , Otto Ettala , Tapio Pahikkala , Antti Airola","doi":"10.1016/j.ijmedinf.2025.106046","DOIUrl":null,"url":null,"abstract":"<div><div><em>Background</em>: Federated learning (FL) is a rapidly advancing technique that enables collaborative model training while preserving data privacy. This approach is particularly relevant in healthcare, where privacy concerns and regulatory restrictions often prevent centralized data sharing. FL has shown promise in tasks such as disease detection, achieving performance levels comparable to centralized systems. However, its practical usability in real-world applications remains underexplored.</div><div><em>Methods</em>: We evaluate the practical effectiveness of FL in predicting whether patients suspected of prostate cancer require invasive biopsy procedures. The study uses 14 publicly available prostate cancer datasets from 10 countries. We propose and benchmark a novel FL evaluation strategy, Leave-Silo-Out (LSO), which quantifies the performance gap between federated training and free-riding (utilizing the federated model without contributing data). Additionally, we investigate whether locally trained models can outperform multi-hospital FL models. The results are assessed with a focus on improving the diagnosis of local patients.</div><div><em>Results</em>: Our findings reveal that the benefits of FL vary with the amount of locally available annotated data. Hospitals with very small datasets see negligible improvements from FL compared to free-riding. Institutions with moderate datasets may achieve some gains through FL training. However, hospitals with extensive datasets often experience little to no advantage from FL and, in some cases, observe reduced performance compared to local training.</div><div><em>Conclusion</em>: Federated learning shows potential in scenarios with limited data availability. However, its practical applicability is highly context-dependent, influenced by factors such as data availability and specific task requirements.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"204 ","pages":"Article 106046"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards practical federated learning and evaluation for medical prediction models\",\"authors\":\"Andrei Kazlouski , Ileana Montoya Perez , Faiza Noor , Mikael Högerman , Otto Ettala , Tapio Pahikkala , Antti Airola\",\"doi\":\"10.1016/j.ijmedinf.2025.106046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div><em>Background</em>: Federated learning (FL) is a rapidly advancing technique that enables collaborative model training while preserving data privacy. This approach is particularly relevant in healthcare, where privacy concerns and regulatory restrictions often prevent centralized data sharing. FL has shown promise in tasks such as disease detection, achieving performance levels comparable to centralized systems. However, its practical usability in real-world applications remains underexplored.</div><div><em>Methods</em>: We evaluate the practical effectiveness of FL in predicting whether patients suspected of prostate cancer require invasive biopsy procedures. The study uses 14 publicly available prostate cancer datasets from 10 countries. We propose and benchmark a novel FL evaluation strategy, Leave-Silo-Out (LSO), which quantifies the performance gap between federated training and free-riding (utilizing the federated model without contributing data). Additionally, we investigate whether locally trained models can outperform multi-hospital FL models. The results are assessed with a focus on improving the diagnosis of local patients.</div><div><em>Results</em>: Our findings reveal that the benefits of FL vary with the amount of locally available annotated data. Hospitals with very small datasets see negligible improvements from FL compared to free-riding. Institutions with moderate datasets may achieve some gains through FL training. However, hospitals with extensive datasets often experience little to no advantage from FL and, in some cases, observe reduced performance compared to local training.</div><div><em>Conclusion</em>: Federated learning shows potential in scenarios with limited data availability. However, its practical applicability is highly context-dependent, influenced by factors such as data availability and specific task requirements.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"204 \",\"pages\":\"Article 106046\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625002631\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002631","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

背景：联邦学习（FL）是一种快速发展的技术，可以在保护数据隐私的同时实现协作模型训练。这种方法尤其适用于医疗保健领域，因为隐私问题和监管限制往往会阻碍集中数据共享。FL在疾病检测等任务中显示出前景，实现了与集中式系统相当的性能水平。然而，它在实际应用中的实际可用性仍未得到充分探索。方法：我们评估FL在预测疑似前列腺癌患者是否需要侵入性活检手术方面的实际有效性。这项研究使用了来自10个国家的14个公开的前列腺癌数据集。我们提出并测试了一种新的FL评估策略，即Leave-Silo-Out (LSO)，它量化了联邦训练和搭便车之间的性能差距（利用联邦模型而不提供数据）。此外，我们还研究了局部训练模型是否优于多医院FL模型。评估结果的重点是提高当地患者的诊断。结果：我们的研究结果表明，FL的益处随本地可用注释数据的数量而变化。数据集非常小的医院与免费乘车相比，FL的改善微不足道。拥有中等数据集的机构可以通过FL培训获得一些收益。然而，拥有大量数据集的医院通常很少甚至没有从FL中获得优势，并且在某些情况下，与本地培训相比，观察到性能下降。结论：联邦学习在数据可用性有限的情况下显示出潜力。然而，其实际适用性高度依赖于上下文，受数据可用性和特定任务要求等因素的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Towards practical federated learning and evaluation for medical prediction models

查看原文本刊更多论文

Towards practical federated learning and evaluation for medical prediction models

Background: Federated learning (FL) is a rapidly advancing technique that enables collaborative model training while preserving data privacy. This approach is particularly relevant in healthcare, where privacy concerns and regulatory restrictions often prevent centralized data sharing. FL has shown promise in tasks such as disease detection, achieving performance levels comparable to centralized systems. However, its practical usability in real-world applications remains underexplored.

Methods: We evaluate the practical effectiveness of FL in predicting whether patients suspected of prostate cancer require invasive biopsy procedures. The study uses 14 publicly available prostate cancer datasets from 10 countries. We propose and benchmark a novel FL evaluation strategy, Leave-Silo-Out (LSO), which quantifies the performance gap between federated training and free-riding (utilizing the federated model without contributing data). Additionally, we investigate whether locally trained models can outperform multi-hospital FL models. The results are assessed with a focus on improving the diagnosis of local patients.

Results: Our findings reveal that the benefits of FL vary with the amount of locally available annotated data. Hospitals with very small datasets see negligible improvements from FL compared to free-riding. Institutions with moderate datasets may achieve some gains through FL training. However, hospitals with extensive datasets often experience little to no advantage from FL and, in some cases, observe reduced performance compared to local training.

Conclusion: Federated learning shows potential in scenarios with limited data availability. However, its practical applicability is highly context-dependent, influenced by factors such as data availability and specific task requirements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.