Byzantine-Resilient Distributed Bandit Online Optimization in Dynamic Environments

IEEE Transactions on Industrial Cyber-Physical Systems Pub Date : 2024-06-06 DOI:10.1109/TICPS.2024.3410846

Mengli Wei;Wenwu Yu;Hongzhe Liu;Duxin Chen

{"title":"Byzantine-Resilient Distributed Bandit Online Optimization in Dynamic Environments","authors":"Mengli Wei;Wenwu Yu;Hongzhe Liu;Duxin Chen","doi":"10.1109/TICPS.2024.3410846","DOIUrl":null,"url":null,"abstract":"We consider the constrained multi-agent online optimization problem in dynamic environments that are vulnerable to Byzantine attacks, where some infiltrated agents may deviate from the prescribed update rule and send arbitrary messages. The objective functions are exposed in a bandit form, i.e., only the function value is revealed to each agent at the sampling instance, and held privately by each agent. The agents only exchange information with their neighbors to update decisions, and the collective goal is to minimize the sum of the unattacked agents' objective functions in dynamic environments, where the same function can only be sampled once. To handle this problem, a Byzantine-Resilient Distributed Bandit Online Convex Optimization (BR-DBOCO) algorithm that can tolerate up to \n<inline-formula><tex-math>$\\mathcal {B}$</tex-math></inline-formula>\n Byzantine agents is developed. Specifically, the BR-DBOCO employs the one-point bandit feedback (OPBF) mechanism and state filter to cope with the objective function, which cannot be explicitly expressed in dynamic environments and the arbitrary deviation states caused by Byzantine attacks, respectively. We show that sublinear expected regret is achieved if the accumulative deviation of the comparator sequence also grows sublinearly with a proper exploration parameter. Finally, experimental results are presented to illustrate the effectiveness of the proposed algorithm.","PeriodicalId":100640,"journal":{"name":"IEEE Transactions on Industrial Cyber-Physical Systems","volume":"2 ","pages":"154-165"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10551450/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We consider the constrained multi-agent online optimization problem in dynamic environments that are vulnerable to Byzantine attacks, where some infiltrated agents may deviate from the prescribed update rule and send arbitrary messages. The objective functions are exposed in a bandit form, i.e., only the function value is revealed to each agent at the sampling instance, and held privately by each agent. The agents only exchange information with their neighbors to update decisions, and the collective goal is to minimize the sum of the unattacked agents' objective functions in dynamic environments, where the same function can only be sampled once. To handle this problem, a Byzantine-Resilient Distributed Bandit Online Convex Optimization (BR-DBOCO) algorithm that can tolerate up to

$\mathcal {B}$

Byzantine agents is developed. Specifically, the BR-DBOCO employs the one-point bandit feedback (OPBF) mechanism and state filter to cope with the objective function, which cannot be explicitly expressed in dynamic environments and the arbitrary deviation states caused by Byzantine attacks, respectively. We show that sublinear expected regret is achieved if the accumulative deviation of the comparator sequence also grows sublinearly with a proper exploration parameter. Finally, experimental results are presented to illustrate the effectiveness of the proposed algorithm.

查看原文本刊更多论文

动态环境中的拜占庭弹性分布式匪徒在线优化

我们考虑的是易受拜占庭攻击的动态环境中的受限多代理在线优化问题，在这种环境中，一些被渗透的代理可能会偏离规定的更新规则，发送任意信息。目标函数以强盗形式暴露，即在采样实例中只向每个代理透露函数值，并由每个代理私下持有。代理只与邻居交换信息以更新决策，集体目标是在动态环境中最小化未受攻击代理的目标函数之和，而在动态环境中，同一函数只能采样一次。为了处理这个问题，我们开发了一种拜占庭弹性分布式匪徒在线凸优化（BR-DBOCO）算法，它可以容忍多达 $\mathcal {B}$ 的拜占庭代理。具体来说，BR-DBOCO 采用了单点匪徒反馈（OPBF）机制和状态过滤器来分别应对无法在动态环境中明确表达的目标函数和拜占庭攻击导致的任意偏差状态。我们证明，如果比较器序列的累积偏差也随着适当的探索参数呈亚线性增长，那么就能实现亚线性预期遗憾。最后，我们给出了实验结果，以说明所提算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Industrial Cyber-Physical Systems

自引率

0.00%

发文量