{"title":"有向时变网络上基于共识的分布式Exp3策略","authors":"Tomoki NAKAMURA, Naoki HAYASHI, Masahiro INUIGUCHI","doi":"10.1587/transfun.2023map0008","DOIUrl":null,"url":null,"abstract":"In this paper, we consider distributed decision-making over directed time-varying multi-agent systems. We consider an adversarial bandit problem in which a group of agents chooses an option from among multiple arms to maximize the total reward. In the proposed method, each agent cooperatively searches for the optimal arm with the highest reward by a consensus-based distributed Exp3 policy. To this end, each agent exchanges the estimation of the reward of each arm and the weight for exploitation with the nearby agents on the network. To unify the explored information of arms, each agent mixes the estimation and the weight of the nearby agents with their own values by a consensus dynamics. Then, each agent updates the probability distribution of arms by combining the Hedge algorithm and the uniform search. We show that the sublinearity of a pseudo-regret can be achieved by appropriately setting the parameters of the distributed Exp3 policy.","PeriodicalId":55003,"journal":{"name":"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Consensus-Based Distributed Exp3 Policy Over Directed Time-Varying Networks\",\"authors\":\"Tomoki NAKAMURA, Naoki HAYASHI, Masahiro INUIGUCHI\",\"doi\":\"10.1587/transfun.2023map0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we consider distributed decision-making over directed time-varying multi-agent systems. We consider an adversarial bandit problem in which a group of agents chooses an option from among multiple arms to maximize the total reward. In the proposed method, each agent cooperatively searches for the optimal arm with the highest reward by a consensus-based distributed Exp3 policy. To this end, each agent exchanges the estimation of the reward of each arm and the weight for exploitation with the nearby agents on the network. To unify the explored information of arms, each agent mixes the estimation and the weight of the nearby agents with their own values by a consensus dynamics. Then, each agent updates the probability distribution of arms by combining the Hedge algorithm and the uniform search. We show that the sublinearity of a pseudo-regret can be achieved by appropriately setting the parameters of the distributed Exp3 policy.\",\"PeriodicalId\":55003,\"journal\":{\"name\":\"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1587/transfun.2023map0008\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1587/transfun.2023map0008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Consensus-Based Distributed Exp3 Policy Over Directed Time-Varying Networks
In this paper, we consider distributed decision-making over directed time-varying multi-agent systems. We consider an adversarial bandit problem in which a group of agents chooses an option from among multiple arms to maximize the total reward. In the proposed method, each agent cooperatively searches for the optimal arm with the highest reward by a consensus-based distributed Exp3 policy. To this end, each agent exchanges the estimation of the reward of each arm and the weight for exploitation with the nearby agents on the network. To unify the explored information of arms, each agent mixes the estimation and the weight of the nearby agents with their own values by a consensus dynamics. Then, each agent updates the probability distribution of arms by combining the Hedge algorithm and the uniform search. We show that the sublinearity of a pseudo-regret can be achieved by appropriately setting the parameters of the distributed Exp3 policy.
期刊介绍:
Includes reports on research, developments, and examinations performed by the Society''s members for the specific fields shown in the category list such as detailed below, the contents of which may advance the development of science and industry:
(1) Reports on new theories, experiments with new contents, or extensions of and supplements to conventional theories and experiments.
(2) Reports on development of measurement technology and various applied technologies.
(3) Reports on the planning, design, manufacture, testing, or operation of facilities, machinery, parts, materials, etc.
(4) Presentation of new methods, suggestion of new angles, ideas, systematization, software, or any new facts regarding the above.