{"title":"基于最优聚类的稳健且可扩展的客户端数据异构联合学习框架","authors":"Zihan Li , Shuai Yuan , Zhitao Guan","doi":"10.1016/j.jpdc.2024.104990","DOIUrl":null,"url":null,"abstract":"<div><div>Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering\",\"authors\":\"Zihan Li , Shuai Yuan , Zhitao Guan\",\"doi\":\"10.1016/j.jpdc.2024.104990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.</div></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001540\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001540","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering
Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.