联合学习与集中学习的综合实验比较。

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation Pub Date : 2025-03-19 DOI:10.1093/database/baaf016

Swier Garst, Julian Dekker, Marcel Reinders

{"title":"联合学习与集中学习的综合实验比较。","authors":"Swier Garst, Julian Dekker, Marcel Reinders","doi":"10.1093/database/baaf016","DOIUrl":null,"url":null,"abstract":"Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comprehensive experimental comparison between federated and centralized learning.\",\"authors\":\"Swier Garst, Julian Dekker, Marcel Reinders\",\"doi\":\"10.1093/database/baaf016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.\",\"PeriodicalId\":10923,\"journal\":{\"name\":\"Database: The Journal of Biological Databases and Curation\",\"volume\":\"2025 \",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database: The Journal of Biological Databases and Curation\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/database/baaf016\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf016","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

联邦学习是一种即将到来的机器学习范式，它允许使用来自多个数据源的数据来训练分类器，而无需数据离开其原始驻留的源。这对于医学研究等用例非常有价值，因为在这些用例中，由于数据的隐私和法律问题，在中心位置收集数据可能非常复杂。在这种情况下，联合学习有可能大大加快研究周期。虽然已经从理论角度对联邦学习和中央学习进行了比较，但还缺乏广泛的性能和学习行为的实验比较。我们对联邦学习和集中式学习进行了全面的实验比较。我们在不同的数据集上评估了不同的分类器，探索了不同样本分布以及客户端不同类别分布的影响。结果表明，在各种设置下，联邦学习策略和中央学习策略的性能相似。联邦学习能够处理数据分布中的各种不平衡。当不同的数据集与位置重合时，它对批处理效果很敏感，类似于中心学习，但这种设置可能更容易被观察到。联邦学习似乎对各种挑战都很健壮，比如倾斜的数据分布、高数据维度、多类问题和复杂模型。总的来说，从我们的比较中得出的见解为应用联邦学习作为共享数据的替代方案提供了很大的希望。在此工作中复制结果的代码可以在：https://github.com/swiergarst/FLComparison上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comprehensive experimental comparison between federated and centralized learning.

Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.