Using Synthetic Data to Mitigate Unfairness and Preserve Privacy through Single-Shot Federated Learning

arXiv - CS - Computers and Society Pub Date : 2024-09-14 DOI:arxiv-2409.09532

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

{"title":"Using Synthetic Data to Mitigate Unfairness and Preserve Privacy through Single-Shot Federated Learning","authors":"Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson","doi":"arxiv-2409.09532","DOIUrl":null,"url":null,"abstract":"To address unfairness issues in federated learning (FL), contemporary\napproaches typically use frequent model parameter updates and transmissions\nbetween the clients and server. In such a process, client-specific information\n(e.g., local dataset size or data-related fairness metrics) must be sent to the\nserver to compute, e.g., aggregation weights. All of this results in high\ntransmission costs and the potential leakage of client information. As an\nalternative, we propose a strategy that promotes fair predictions across\nclients without the need to pass information between the clients and server\niteratively and prevents client data leakage. For each client, we first use\ntheir local dataset to obtain a synthetic dataset by solving a bilevel\noptimization problem that addresses unfairness concerns during the learning\nprocess. We then pass each client's synthetic dataset to the server, the\ncollection of which is used to train the server model using conventional\nmachine learning techniques (that do not take fairness metrics into account).\nThus, we eliminate the need to handle fairness-specific aggregation weights\nwhile preserving client privacy. Our approach requires only a single\ncommunication between the clients and the server, thus making it\ncomputationally cost-effective, able to maintain privacy, and able to ensuring\nfairness. We present empirical evidence to demonstrate the advantages of our\napproach. The results illustrate that our method effectively uses synthetic\ndata as a means to mitigate unfairness and preserve client privacy.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

To address unfairness issues in federated learning (FL), contemporary approaches typically use frequent model parameter updates and transmissions between the clients and server. In such a process, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute, e.g., aggregation weights. All of this results in high transmission costs and the potential leakage of client information. As an alternative, we propose a strategy that promotes fair predictions across clients without the need to pass information between the clients and server iteratively and prevents client data leakage. For each client, we first use their local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that addresses unfairness concerns during the learning process. We then pass each client's synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that do not take fairness metrics into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server, thus making it computationally cost-effective, able to maintain privacy, and able to ensuring fairness. We present empirical evidence to demonstrate the advantages of our approach. The results illustrate that our method effectively uses synthetic data as a means to mitigate unfairness and preserve client privacy.

查看原文本刊更多论文

使用合成数据，通过单次联合学习减少不公平现象并保护隐私

为了解决联合学习（FL）中的不公平问题，当代的方法通常使用频繁的模型参数更新以及客户端和服务器之间的传输。在此过程中，必须向服务器发送客户机特定信息（如本地数据集大小或与数据相关的公平性指标），以计算聚合权重等。所有这些都会导致高昂的传输成本和客户端信息的潜在泄漏。作为替代方案，我们提出了一种策略，无需在客户端和服务器之间传递信息，就能促进客户端间的公平预测，并防止客户端数据泄漏。对于每个客户端，我们首先使用它们的本地数据集，通过解决一个双开发优化问题来获得一个合成数据集，从而解决学习过程中的不公平问题。然后，我们将每个客户端的合成数据集传递给服务器，服务器使用传统的机器学习技术（不考虑公平性指标）训练模型。我们的方法只需要在客户端和服务器之间进行一次通信，因此计算成本低廉，既能维护隐私，又能确保公平性。我们提出了经验证据来证明我们方法的优势。结果表明，我们的方法有效地利用了合成数据作为减轻不公平和维护客户隐私的手段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computers and Society

自引率

0.00%

发文量