Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency

Pablo J. Villacorta, Luis Quesada, D. Pelta
{"title":"Automatic design of deterministic sequences of decisions for a repeated imitation game with action-state dependency","authors":"Pablo J. Villacorta, Luis Quesada, D. Pelta","doi":"10.1109/CIG.2012.6374131","DOIUrl":null,"url":null,"abstract":"A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.","PeriodicalId":288052,"journal":{"name":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2012.6374131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A repeated conflicting situation between two agents is presented in the context of adversarial decision making. The agents simultaneously choose an action as a response to an external event, and accumulate some payoff for their decisions. The next event statistically depends on the last choices of the agents. The objective of the first agent, called the imitator, is to imitate the behaviour of the other. The second agent tries not to be properly predicted while, at the same time, choosing actions that report a high payoff. When the situation is repeated through time, the imitator has the opportunity to learn the adversary's behaviour. In this work, we present a way to automatically design a sequence of deterministic decisions for one of the agents maximizing the expected payoff while keeping his choices difficult to predict. Determinism provides some practical advantages over partially randomized strategies investigated in previous works, mainly the reduction of the variance of the payoff when using the strategy.
具有动作-状态依赖关系的重复模仿博弈中确定性决策序列的自动设计
在对抗性决策的背景下,出现了两个主体之间反复冲突的情况。代理同时选择一个行动作为对外部事件的响应,并为他们的决定积累一些回报。下一个事件在统计上取决于代理的最后一个选择。第一个智能体(称为模仿者)的目标是模仿另一个智能体的行为。第二个智能体试图不被正确预测,同时选择报告高回报的行动。当这种情况不断重复时,模仿者就有机会学习对手的行为。在这项工作中,我们提出了一种方法,在保持其选择难以预测的同时,为一个代理最大化预期收益自动设计一系列确定性决策。确定性与先前研究的部分随机化策略相比具有一些实际优势,主要是在使用该策略时减少了收益的方差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信