具有动作-状态依赖关系的重复模仿博弈中确定性决策序列的自动设计

2012 IEEE Conference on Computational Intelligence and Games (CIG) Pub Date : 2012-12-06 DOI:10.1109/CIG.2012.6374131

Pablo J. Villacorta, Luis Quesada, D. Pelta

{"title":"具有动作-状态依赖关系的重复模仿博弈中确定性决策序列的自动设计","authors":"Pablo J. Villacorta, Luis Quesada, D. Pelta","doi":"10.1109/CIG.2012.6374131","DOIUrl":null,"url":null,"abstract":"A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.","PeriodicalId":288052,"journal":{"name":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency\",\"authors\":\"Pablo J. Villacorta, Luis Quesada, D. Pelta\",\"doi\":\"10.1109/CIG.2012.6374131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.\",\"PeriodicalId\":288052,\"journal\":{\"name\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2012.6374131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2012.6374131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在对抗性决策的背景下，出现了两个主体之间反复冲突的情况。代理同时选择一个行动作为对外部事件的响应，并为他们的决定积累一些回报。下一个事件在统计上取决于代理的最后一个选择。第一个智能体(称为模仿者)的目标是模仿另一个智能体的行为。第二个智能体试图不被正确预测，同时选择报告高回报的行动。当这种情况不断重复时，模仿者就有机会学习对手的行为。在这项工作中，我们提出了一种方法，在保持其选择难以预测的同时，为一个代理最大化预期收益自动设计一系列确定性决策。确定性与先前研究的部分随机化策略相比具有一些实际优势，主要是在使用该策略时减少了收益的方差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency

A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Conference on Computational Intelligence and Games (CIG)

自引率

0.00%

发文量