{"title":"两人零和马尔可夫博弈的ϵ-最优策略设计","authors":"Kaiyun Xie;Junlin Xiong","doi":"10.1109/LCSYS.2024.3474057","DOIUrl":null,"url":null,"abstract":"This letter focuses on designing approximate Nash strategies for the two-person zero-sum Markov game. Using the receding horizon method, the \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n-optimal strategies are designed to approximate Nash strategies by executing finite Gauss-Seidel iterations. The relationship between the approximation value of \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n and the number of iterations is also analyzed. Additionally, the \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n-optimal strategies are designed for two scenarios with imprecise parameters. For scenarios with imprecise values, the value of \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n is determined based on the errors between imprecise and iteration values. It provides a theoretical basis for efficiently designing \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n-optimal strategies using heuristic algorithms or approximate dynamic programming. For scenarios with imprecise transition probabilities, the value of \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n is determined based on the errors between the estimated and practical transition probabilities. It enables the use of pattern recognition technology or other methods to estimate practical transition probabilities for designing \n<inline-formula> <tex-math>$\\epsilon $ </tex-math></inline-formula>\n-optimal strategies.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games\",\"authors\":\"Kaiyun Xie;Junlin Xiong\",\"doi\":\"10.1109/LCSYS.2024.3474057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter focuses on designing approximate Nash strategies for the two-person zero-sum Markov game. Using the receding horizon method, the \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n-optimal strategies are designed to approximate Nash strategies by executing finite Gauss-Seidel iterations. The relationship between the approximation value of \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n and the number of iterations is also analyzed. Additionally, the \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n-optimal strategies are designed for two scenarios with imprecise parameters. For scenarios with imprecise values, the value of \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n is determined based on the errors between imprecise and iteration values. It provides a theoretical basis for efficiently designing \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n-optimal strategies using heuristic algorithms or approximate dynamic programming. For scenarios with imprecise transition probabilities, the value of \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n is determined based on the errors between the estimated and practical transition probabilities. It enables the use of pattern recognition technology or other methods to estimate practical transition probabilities for designing \\n<inline-formula> <tex-math>$\\\\epsilon $ </tex-math></inline-formula>\\n-optimal strategies.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10705104/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10705104/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games
This letter focuses on designing approximate Nash strategies for the two-person zero-sum Markov game. Using the receding horizon method, the
$\epsilon $
-optimal strategies are designed to approximate Nash strategies by executing finite Gauss-Seidel iterations. The relationship between the approximation value of
$\epsilon $
and the number of iterations is also analyzed. Additionally, the
$\epsilon $
-optimal strategies are designed for two scenarios with imprecise parameters. For scenarios with imprecise values, the value of
$\epsilon $
is determined based on the errors between imprecise and iteration values. It provides a theoretical basis for efficiently designing
$\epsilon $
-optimal strategies using heuristic algorithms or approximate dynamic programming. For scenarios with imprecise transition probabilities, the value of
$\epsilon $
is determined based on the errors between the estimated and practical transition probabilities. It enables the use of pattern recognition technology or other methods to estimate practical transition probabilities for designing
$\epsilon $
-optimal strategies.