{"title":"具有动作-状态依赖关系的重复模仿博弈中确定性决策序列的自动设计","authors":"Pablo J. Villacorta, Luis Quesada, D. Pelta","doi":"10.1109/CIG.2012.6374131","DOIUrl":null,"url":null,"abstract":"A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.","PeriodicalId":288052,"journal":{"name":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency\",\"authors\":\"Pablo J. Villacorta, Luis Quesada, D. Pelta\",\"doi\":\"10.1109/CIG.2012.6374131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.\",\"PeriodicalId\":288052,\"journal\":{\"name\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2012.6374131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2012.6374131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency
A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.