Anticipating Oblivious Opponents in Stochastic Games

Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi
{"title":"Anticipating Oblivious Opponents in Stochastic Games","authors":"Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi","doi":"arxiv-2409.11671","DOIUrl":null,"url":null,"abstract":"We present an approach for systematically anticipating the actions and\npolicies employed by \\emph{oblivious} environments in concurrent stochastic\ngames, while maximizing a reward function. Our main contribution lies in the\nsynthesis of a finite \\emph{information state machine} whose alphabet ranges\nover the actions of the environment. Each state of the automaton is mapped to a\nbelief state about the policy used by the environment. We introduce a notion of\nconsistency that guarantees that the belief states tracked by our automaton\nstays within a fixed distance of the precise belief state obtained by knowledge\nof the full history. We provide methods for checking consistency of an\nautomaton and a synthesis approach which upon successful termination yields\nsuch a machine. We show how the information state machine yields an MDP that\nserves as the starting point for computing optimal policies for maximizing a\nreward function defined over plays. We present an experimental evaluation over\nbenchmark examples including human activity data for tasks such as cataract\nsurgery and furniture assembly, wherein our approach successfully anticipates\nthe policies and actions of the environment in order to maximize the reward.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present an approach for systematically anticipating the actions and policies employed by \emph{oblivious} environments in concurrent stochastic games, while maximizing a reward function. Our main contribution lies in the synthesis of a finite \emph{information state machine} whose alphabet ranges over the actions of the environment. Each state of the automaton is mapped to a belief state about the policy used by the environment. We introduce a notion of consistency that guarantees that the belief states tracked by our automaton stays within a fixed distance of the precise belief state obtained by knowledge of the full history. We provide methods for checking consistency of an automaton and a synthesis approach which upon successful termination yields such a machine. We show how the information state machine yields an MDP that serves as the starting point for computing optimal policies for maximizing a reward function defined over plays. We present an experimental evaluation over benchmark examples including human activity data for tasks such as cataract surgery and furniture assembly, wherein our approach successfully anticipates the policies and actions of the environment in order to maximize the reward.
在随机博弈中预测被忽视的对手
我们提出了一种方法,用于系统地预测并发随机游戏中的(emph{oblivious})环境所采用的行动和策略,同时最大化奖励函数。我们的主要贡献在于合成了一个有限的emph{信息状态机},它的字母表涵盖了环境的行动。自动机的每个状态都被映射为关于环境所使用策略的信念状态。我们引入了一个一致性概念,它能保证我们的自动机所跟踪的信念状态与通过了解完整历史所获得的精确信念状态保持在一个固定的距离之内。我们提供了检查自动机一致性的方法,以及在成功终止后产生这样一台机器的合成方法。我们展示了信息状态机如何产生一个 MDP,作为计算最优策略的起点,以最大化在游戏中定义的向度函数。我们通过白内障手术和家具组装等任务的人类活动数据等基准实例进行了实验评估,结果表明,我们的方法成功地预测了环境的策略和行动,从而实现了回报的最大化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信