Beyond consensus and synchrony in decentralized online optimization using saddle point method

A. S. Bedi, Alec Koppel, K. Rajawat
{"title":"Beyond consensus and synchrony in decentralized online optimization using saddle point method","authors":"A. S. Bedi, Alec Koppel, K. Rajawat","doi":"10.1109/ACSSC.2017.8335186","DOIUrl":null,"url":null,"abstract":"We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.
鞍点法在分散在线优化中的超越共识与同步
我们考虑了多智能体系统中的在线学习问题,该系统由不同的智能体子集组成,没有共同的时间尺度。网络中的每个个体都有最小化全局后悔的责任,全局后悔是每个智能体相对于一个固定的全局千里眼行为者的行动的瞬时次最优性的总和,该行为者在所有时间跨度t内都能获得网络上的所有成本。由于智能体不被假设为相同类型,所有智能体寻求共同行动的假设被违反了。因此,我们引入了网络差异的概念,作为衡量代理在保持不同本地行为的同时协调其行为的程度。此外,不假设代理在一个公共时间索引上接收顺序到达的成本,因此寻求以异步方式学习。提出了一种改进的Arrow-Hurwicz鞍点算法来控制全局后悔和网络差异的增长。该算法使用拉格朗日乘数来惩罚代理之间的差异,并导致依赖于本地操作和邻居之间变量交换的实现。用这种方法做出的决策导致后悔的顺序为O(√T),网络差异为O(T3/4)。对异步运行的传感器网络估计空间相关随机场进行了经验评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信