{"title":"鞍点法在分散在线优化中的超越共识与同步","authors":"A. S. Bedi, Alec Koppel, K. Rajawat","doi":"10.1109/ACSSC.2017.8335186","DOIUrl":null,"url":null,"abstract":"We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Beyond consensus and synchrony in decentralized online optimization using saddle point method\",\"authors\":\"A. S. Bedi, Alec Koppel, K. Rajawat\",\"doi\":\"10.1109/ACSSC.2017.8335186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.\",\"PeriodicalId\":296208,\"journal\":{\"name\":\"2017 51st Asilomar Conference on Signals, Systems, and Computers\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 51st Asilomar Conference on Signals, Systems, and Computers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACSSC.2017.8335186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Beyond consensus and synchrony in decentralized online optimization using saddle point method
We consider online learning problems in multiagent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global regret, which is a sum of the instantaneous sub-optimality of each agent's actions with respect to a fixed global clairvoyant actor with access to all costs across the network for all time up to a time-horizon T. Since agents are not assumed to be of the same type, the hypothesis that all agents seek a common action is violated, and thus we instead introduce a notion of network discrepancy as a measure of how well agents coordinate their behavior while retaining distinct local behavior. Moreover, agents are not assumed to receive the sequentially arriving costs on a common time index, and thus seek to learn in an asynchronous manner. A variant of the Arrow-Hurwicz saddle point algorithm is proposed to control the growth of global regret and network discrepancy. This algorithm uses Lagrange multipliers to penalize the discrepancies between agents and leads to an implementation that relies on local operations and exchange of variables between neighbors. Decisions made with this method lead to regret whose order is O(√T) and network discrepancy O(T3/4). Empirical evaluation is conducted on an asynchronously operating sensor network estimating a spatially correlated random field.