{"title":"Value Iteration-Based Distributed Adaptive Dynamic Programming for Multi-Player Differential Game with Incomplete Information","authors":"Yun Zhang;Yuqi Wang;Yunze Cai","doi":"10.1109/JAS.2024.124950","DOIUrl":null,"url":null,"abstract":"In this paper, a distributed adaptive dynamic programming (ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting, players have no access to the information of others' system parameters or control laws. Each player adopts an on-policy value iteration algorithm as the basic learning framework. To deal with the incomplete information structure, players collect a period of system trajectory data to compensate for the lack of information. The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy. Theoretical analysis shows that by adopting proximal policy searching rules, the approximated policies can converge to a neighborhood of equilibrium policies. The efficacy of our method is illustrated by three examples, which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 2","pages":"436-447"},"PeriodicalIF":15.3000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10846926/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, a distributed adaptive dynamic programming (ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting, players have no access to the information of others' system parameters or control laws. Each player adopts an on-policy value iteration algorithm as the basic learning framework. To deal with the incomplete information structure, players collect a period of system trajectory data to compensate for the lack of information. The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy. Theoretical analysis shows that by adopting proximal policy searching rules, the approximated policies can converge to a neighborhood of equilibrium policies. The efficacy of our method is illustrated by three examples, which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
期刊介绍:
The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control.
Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.