Differentially Private Linear Bandits with Partial Distributed Feedback

2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) Pub Date : 2022-07-12 DOI:10.23919/WiOpt56218.2022.9930524

Fengjiao Li, Xingyu Zhou, Bo Ji

{"title":"Differentially Private Linear Bandits with Partial Distributed Feedback","authors":"Fengjiao Li, Xingyu Zhou, Bo Ji","doi":"10.23919/WiOpt56218.2022.9930524","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost, but often leads to privacy concerns. To tackle this problem, we consider differentially private distributed linear bandits, where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients’ local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we prove that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection “for free” in the sense that the additional cost due to privacy guarantees is a lower-order additive term. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.","PeriodicalId":228040,"journal":{"name":"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/WiOpt56218.2022.9930524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost, but often leads to privacy concerns. To tackle this problem, we consider differentially private distributed linear bandits, where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients’ local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we prove that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection “for free” in the sense that the additional cost due to privacy guarantees is a lower-order additive term. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.

查看原文本刊更多论文

具有部分分布反馈的差分私有线性强盗

本文研究了只有部分分布反馈的全局奖励最大化问题。这个问题是由几个现实世界的应用程序(例如，蜂窝网络配置，动态定价和策略选择)引起的，在这些应用程序中，一个中央实体采取的行动会影响大量人口，从而产生全局奖励。然而，从所有人那里收集这样的奖励反馈不仅会产生高昂的成本，而且往往会导致隐私问题。为了解决这个问题，我们考虑差分私有分布式线性强盗，其中只有从总体中选择的用户子集(称为客户端)参与学习过程，中央服务器通过以差分私有的方式迭代地聚合这些客户端的本地反馈，从这些部分反馈中学习全局模型。然后，我们提出了一个统一的算法学习框架，称为差分私有分布式阶段消除(DP- dpe)，它可以与流行的差分隐私(DP)模型(包括中心DP，局部DP和shuffle DP)自然集成。进一步证明了DP-DPE算法同时实现了亚线性遗憾和亚线性通信成本。有趣的是，DP-DPE还“免费”实现了隐私保护，因为隐私保证的额外成本是一个低阶可加项。最后，通过仿真验证了理论结果，验证了DP-DPE算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)

自引率

0.00%

发文量