Anticipating Oblivious Opponents in Stochastic Games

arXiv - EE - Systems and Control Pub Date : 2024-09-18 DOI:arxiv-2409.11671

Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi

{"title":"Anticipating Oblivious Opponents in Stochastic Games","authors":"Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi","doi":"arxiv-2409.11671","DOIUrl":null,"url":null,"abstract":"We present an approach for systematically anticipating the actions and\npolicies employed by \\emph{oblivious} environments in concurrent stochastic\ngames, while maximizing a reward function. Our main contribution lies in the\nsynthesis of a finite \\emph{information state machine} whose alphabet ranges\nover the actions of the environment. Each state of the automaton is mapped to a\nbelief state about the policy used by the environment. We introduce a notion of\nconsistency that guarantees that the belief states tracked by our automaton\nstays within a fixed distance of the precise belief state obtained by knowledge\nof the full history. We provide methods for checking consistency of an\nautomaton and a synthesis approach which upon successful termination yields\nsuch a machine. We show how the information state machine yields an MDP that\nserves as the starting point for computing optimal policies for maximizing a\nreward function defined over plays. We present an experimental evaluation over\nbenchmark examples including human activity data for tasks such as cataract\nsurgery and furniture assembly, wherein our approach successfully anticipates\nthe policies and actions of the environment in order to maximize the reward.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present an approach for systematically anticipating the actions and policies employed by \emph{oblivious} environments in concurrent stochastic games, while maximizing a reward function. Our main contribution lies in the synthesis of a finite \emph{information state machine} whose alphabet ranges over the actions of the environment. Each state of the automaton is mapped to a belief state about the policy used by the environment. We introduce a notion of consistency that guarantees that the belief states tracked by our automaton stays within a fixed distance of the precise belief state obtained by knowledge of the full history. We provide methods for checking consistency of an automaton and a synthesis approach which upon successful termination yields such a machine. We show how the information state machine yields an MDP that serves as the starting point for computing optimal policies for maximizing a reward function defined over plays. We present an experimental evaluation over benchmark examples including human activity data for tasks such as cataract surgery and furniture assembly, wherein our approach successfully anticipates the policies and actions of the environment in order to maximize the reward.

查看原文本刊更多论文

在随机博弈中预测被忽视的对手

我们提出了一种方法，用于系统地预测并发随机游戏中的（emph{oblivious}）环境所采用的行动和策略，同时最大化奖励函数。我们的主要贡献在于合成了一个有限的emph{信息状态机}，它的字母表涵盖了环境的行动。自动机的每个状态都被映射为关于环境所使用策略的信念状态。我们引入了一个一致性概念，它能保证我们的自动机所跟踪的信念状态与通过了解完整历史所获得的精确信念状态保持在一个固定的距离之内。我们提供了检查自动机一致性的方法，以及在成功终止后产生这样一台机器的合成方法。我们展示了信息状态机如何产生一个 MDP，作为计算最优策略的起点，以最大化在游戏中定义的向度函数。我们通过白内障手术和家具组装等任务的人类活动数据等基准实例进行了实验评估，结果表明，我们的方法成功地预测了环境的策略和行动，从而实现了回报的最大化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Systems and Control

自引率

0.00%

发文量