Stochastic Two-Player Zero-Sum Learning Differential Games

2019 IEEE 15th International Conference on Control and Automation (ICCA) Pub Date : 2019-07-01 DOI:10.1109/ICCA.2019.8899568

Mushuang Liu, Yan Wan, F. Lewis, V. Lopez

{"title":"Stochastic Two-Player Zero-Sum Learning Differential Games","authors":"Mushuang Liu, Yan Wan, F. Lewis, V. Lopez","doi":"10.1109/ICCA.2019.8899568","DOIUrl":null,"url":null,"abstract":"The two-player zero-sum differential game has been extensively studied, partially because its solution implies the $H_{\\infty}$ optimality. Existing studies on zero-sum differential games either assume deterministic dynamics or the dynamics corrupted by additive noise. In realistic environments, high-dimensional environmental uncertainties often modulate system dynamics in a more complicated fashion. In this paper, we study the stochastic two-player zero-sum differential game governed by more general uncertain linear dynamics. We show that the optimal control policies for this game can be found by solving the Hamilton-Jacobi-Bellman (HJB) equation. We prove that with the derived optimal control policies, the system is asymptotically stable in the mean, and reaches the Nash equilibrium. To solve the stochastic two-player zero-sum game online, we design a new policy iteration (PI) algorithm that integrates the integral reinforcement learning (IRL) and an efficient uncertainty evaluation method—multivariate probabilistic collocation method (MPCM). This algorithm provides a fast online solution for the stochastic two-player zero-sum differential game subject to multiple uncertainties in the system dynamics.","PeriodicalId":130891,"journal":{"name":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA.2019.8899568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The two-player zero-sum differential game has been extensively studied, partially because its solution implies the $H_{\infty}$ optimality. Existing studies on zero-sum differential games either assume deterministic dynamics or the dynamics corrupted by additive noise. In realistic environments, high-dimensional environmental uncertainties often modulate system dynamics in a more complicated fashion. In this paper, we study the stochastic two-player zero-sum differential game governed by more general uncertain linear dynamics. We show that the optimal control policies for this game can be found by solving the Hamilton-Jacobi-Bellman (HJB) equation. We prove that with the derived optimal control policies, the system is asymptotically stable in the mean, and reaches the Nash equilibrium. To solve the stochastic two-player zero-sum game online, we design a new policy iteration (PI) algorithm that integrates the integral reinforcement learning (IRL) and an efficient uncertainty evaluation method—multivariate probabilistic collocation method (MPCM). This algorithm provides a fast online solution for the stochastic two-player zero-sum differential game subject to multiple uncertainties in the system dynamics.

查看原文本刊更多论文

随机二人零和学习微分对策

二人零和微分博弈已被广泛研究，部分原因是其解隐含$H_{\infty}$最优性。现有的零和微分对策研究要么采用确定性动力学，要么采用被加性噪声破坏的动力学。在现实环境中，高维环境不确定性往往以更复杂的方式调节系统动力学。本文研究了由更一般的不确定线性动力学控制的随机二人零和微分对策。我们通过求解Hamilton-Jacobi-Bellman (HJB)方程证明了该对策的最优控制策略。利用所导出的最优控制策略，证明了系统在均值上渐近稳定，并达到纳什均衡。为了解决在线随机二人零和博弈问题，我们设计了一种新的策略迭代(PI)算法，该算法将积分强化学习(IRL)和一种高效的不确定性评估方法-多元概率搭配法(MPCM)相结合。该算法为系统动力学中存在多个不确定性的随机二人零和微分对策提供了快速在线解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 15th International Conference on Control and Automation (ICCA)

自引率

0.00%

发文量