{"title":"可计数mdp中点收益、平均收益和总收益目标的策略复杂性","authors":"Richard Mayr, Eric Munday","doi":"10.46298/lmcs-19(1:16)2023","DOIUrl":null,"url":null,"abstract":"We study countably infinite Markov decision processes (MDPs) with real-valued\ntransition rewards. Every infinite run induces the following sequences of\npayoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.\nMean payoff (the sequence of the sums of all rewards so far, divided by the\nnumber of steps), and 3. Total payoff (the sequence of the sums of all rewards\nso far). For each payoff type, the objective is to maximize the probability\nthat the $\\liminf$ is non-negative. We establish the complete picture of the\nstrategy complexity of these objectives, i.e., how much memory is necessary and\nsufficient for $\\varepsilon$-optimal (resp. optimal) strategies. Some cases can\nbe won with memoryless deterministic strategies, while others require a step\ncounter, a reward counter, or both.","PeriodicalId":314387,"journal":{"name":"Log. Methods Comput. Sci.","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs\",\"authors\":\"Richard Mayr, Eric Munday\",\"doi\":\"10.46298/lmcs-19(1:16)2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study countably infinite Markov decision processes (MDPs) with real-valued\\ntransition rewards. Every infinite run induces the following sequences of\\npayoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.\\nMean payoff (the sequence of the sums of all rewards so far, divided by the\\nnumber of steps), and 3. Total payoff (the sequence of the sums of all rewards\\nso far). For each payoff type, the objective is to maximize the probability\\nthat the $\\\\liminf$ is non-negative. We establish the complete picture of the\\nstrategy complexity of these objectives, i.e., how much memory is necessary and\\nsufficient for $\\\\varepsilon$-optimal (resp. optimal) strategies. Some cases can\\nbe won with memoryless deterministic strategies, while others require a step\\ncounter, a reward counter, or both.\",\"PeriodicalId\":314387,\"journal\":{\"name\":\"Log. Methods Comput. Sci.\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Log. Methods Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46298/lmcs-19(1:16)2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Log. Methods Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-19(1:16)2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs
We study countably infinite Markov decision processes (MDPs) with real-valued
transition rewards. Every infinite run induces the following sequences of
payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.
Mean payoff (the sequence of the sums of all rewards so far, divided by the
number of steps), and 3. Total payoff (the sequence of the sums of all rewards
so far). For each payoff type, the objective is to maximize the probability
that the $\liminf$ is non-negative. We establish the complete picture of the
strategy complexity of these objectives, i.e., how much memory is necessary and
sufficient for $\varepsilon$-optimal (resp. optimal) strategies. Some cases can
be won with memoryless deterministic strategies, while others require a step
counter, a reward counter, or both.