{"title":"线性微分博弈的逆强化学习方法","authors":"Hamed Jabbari Asl, Eiji Uchibe","doi":"10.1016/j.sysconle.2024.105936","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we considered the problem of inverse reinforcement learning or estimating the cost function of expert players in multi-player differential games. We proposed two online data-driven solutions for linear–quadratic games that are applicable to systems that fulfill a specific dimension criterion or whose unknown matrices in the cost function conform to a diagonal condition. The first method, which is partially model-free, utilizes the trajectories of expert agents to solve the problem. The second method is entirely model-free and employs the trajectories of both expert and learner agents. We determined the conditions under which the solutions are applicable and identified the necessary requirements for the collected data. We conducted numerical simulations to establish the effectiveness of the proposed methods.</div></div>","PeriodicalId":49450,"journal":{"name":"Systems & Control Letters","volume":"193 ","pages":"Article 105936"},"PeriodicalIF":2.1000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inverse reinforcement learning methods for linear differential games\",\"authors\":\"Hamed Jabbari Asl, Eiji Uchibe\",\"doi\":\"10.1016/j.sysconle.2024.105936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this study, we considered the problem of inverse reinforcement learning or estimating the cost function of expert players in multi-player differential games. We proposed two online data-driven solutions for linear–quadratic games that are applicable to systems that fulfill a specific dimension criterion or whose unknown matrices in the cost function conform to a diagonal condition. The first method, which is partially model-free, utilizes the trajectories of expert agents to solve the problem. The second method is entirely model-free and employs the trajectories of both expert and learner agents. We determined the conditions under which the solutions are applicable and identified the necessary requirements for the collected data. We conducted numerical simulations to establish the effectiveness of the proposed methods.</div></div>\",\"PeriodicalId\":49450,\"journal\":{\"name\":\"Systems & Control Letters\",\"volume\":\"193 \",\"pages\":\"Article 105936\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems & Control Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016769112400224X\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems & Control Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016769112400224X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Inverse reinforcement learning methods for linear differential games
In this study, we considered the problem of inverse reinforcement learning or estimating the cost function of expert players in multi-player differential games. We proposed two online data-driven solutions for linear–quadratic games that are applicable to systems that fulfill a specific dimension criterion or whose unknown matrices in the cost function conform to a diagonal condition. The first method, which is partially model-free, utilizes the trajectories of expert agents to solve the problem. The second method is entirely model-free and employs the trajectories of both expert and learner agents. We determined the conditions under which the solutions are applicable and identified the necessary requirements for the collected data. We conducted numerical simulations to establish the effectiveness of the proposed methods.
期刊介绍:
Founded in 1981 by two of the pre-eminent control theorists, Roger Brockett and Jan Willems, Systems & Control Letters is one of the leading journals in the field of control theory. The aim of the journal is to allow dissemination of relatively concise but highly original contributions whose high initial quality enables a relatively rapid review process. All aspects of the fields of systems and control are covered, especially mathematically-oriented and theoretical papers that have a clear relevance to engineering, physical and biological sciences, and even economics. Application-oriented papers with sophisticated and rigorous mathematical elements are also welcome.