{"title":"Anticipating Oblivious Opponents in Stochastic Games","authors":"Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi","doi":"arxiv-2409.11671","DOIUrl":null,"url":null,"abstract":"We present an approach for systematically anticipating the actions and\npolicies employed by \\emph{oblivious} environments in concurrent stochastic\ngames, while maximizing a reward function. Our main contribution lies in the\nsynthesis of a finite \\emph{information state machine} whose alphabet ranges\nover the actions of the environment. Each state of the automaton is mapped to a\nbelief state about the policy used by the environment. We introduce a notion of\nconsistency that guarantees that the belief states tracked by our automaton\nstays within a fixed distance of the precise belief state obtained by knowledge\nof the full history. We provide methods for checking consistency of an\nautomaton and a synthesis approach which upon successful termination yields\nsuch a machine. We show how the information state machine yields an MDP that\nserves as the starting point for computing optimal policies for maximizing a\nreward function defined over plays. We present an experimental evaluation over\nbenchmark examples including human activity data for tasks such as cataract\nsurgery and furniture assembly, wherein our approach successfully anticipates\nthe policies and actions of the environment in order to maximize the reward.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present an approach for systematically anticipating the actions and
policies employed by \emph{oblivious} environments in concurrent stochastic
games, while maximizing a reward function. Our main contribution lies in the
synthesis of a finite \emph{information state machine} whose alphabet ranges
over the actions of the environment. Each state of the automaton is mapped to a
belief state about the policy used by the environment. We introduce a notion of
consistency that guarantees that the belief states tracked by our automaton
stays within a fixed distance of the precise belief state obtained by knowledge
of the full history. We provide methods for checking consistency of an
automaton and a synthesis approach which upon successful termination yields
such a machine. We show how the information state machine yields an MDP that
serves as the starting point for computing optimal policies for maximizing a
reward function defined over plays. We present an experimental evaluation over
benchmark examples including human activity data for tasks such as cataract
surgery and furniture assembly, wherein our approach successfully anticipates
the policies and actions of the environment in order to maximize the reward.