{"title":"输入饱和非完整车辆追逃博弈的动态历史数据强化学习","authors":"Fei Zhang;Guang-Hong Yang","doi":"10.1109/TSMC.2025.3595891","DOIUrl":null,"url":null,"abstract":"This article studies the pursuit–evasion game involving nonholonomic vehicles constrained by input saturation, aiming for the pursuer to intercept an evasive opponent. Unlike the previous game research neglecting the practical kinematic constraints, a coupled nonlinear system is formulated to elucidate the interaction dynamics between the players. After that, the optimal control strategies are derived by solving the Hamilton–Jacobi–Isaacs (HJI) equation linked to a special nonquadratic cost function. The Nash equilibrium analysis and finite-time capturability are conducted. To learn the optimal pursuit–evasion strategy pair, a fixed-time convergent reinforcement learning (RL) algorithm is proposed, which leverages a novel residual design to facilitate weight updates by collecting and evaluating current and historical data based on information quality. Compared with the existing RL methods that suffer from sluggish convergence due to an asymptotic learning rule and the stringent persistent excitation (PE) condition, the proposed RL relaxes the PE to an easily achievable and online verifiable finite excitation (FE) condition, allowing rapid weight convergence within a fixed period. Simulations and comparisons validate the effectiveness and superiority of the proposed method, showing a 61% reduction in convergence time in contrast to the prevailing RL schemes.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7539-7550"},"PeriodicalIF":8.7000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic Historical Data-Based Reinforcement Learning for Pursuit–Evasion Games of Nonholonomic Vehicles With Input Saturation\",\"authors\":\"Fei Zhang;Guang-Hong Yang\",\"doi\":\"10.1109/TSMC.2025.3595891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article studies the pursuit–evasion game involving nonholonomic vehicles constrained by input saturation, aiming for the pursuer to intercept an evasive opponent. Unlike the previous game research neglecting the practical kinematic constraints, a coupled nonlinear system is formulated to elucidate the interaction dynamics between the players. After that, the optimal control strategies are derived by solving the Hamilton–Jacobi–Isaacs (HJI) equation linked to a special nonquadratic cost function. The Nash equilibrium analysis and finite-time capturability are conducted. To learn the optimal pursuit–evasion strategy pair, a fixed-time convergent reinforcement learning (RL) algorithm is proposed, which leverages a novel residual design to facilitate weight updates by collecting and evaluating current and historical data based on information quality. Compared with the existing RL methods that suffer from sluggish convergence due to an asymptotic learning rule and the stringent persistent excitation (PE) condition, the proposed RL relaxes the PE to an easily achievable and online verifiable finite excitation (FE) condition, allowing rapid weight convergence within a fixed period. Simulations and comparisons validate the effectiveness and superiority of the proposed method, showing a 61% reduction in convergence time in contrast to the prevailing RL schemes.\",\"PeriodicalId\":48915,\"journal\":{\"name\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"volume\":\"55 10\",\"pages\":\"7539-7550\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Systems Man Cybernetics-Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11134315/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11134315/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Dynamic Historical Data-Based Reinforcement Learning for Pursuit–Evasion Games of Nonholonomic Vehicles With Input Saturation
This article studies the pursuit–evasion game involving nonholonomic vehicles constrained by input saturation, aiming for the pursuer to intercept an evasive opponent. Unlike the previous game research neglecting the practical kinematic constraints, a coupled nonlinear system is formulated to elucidate the interaction dynamics between the players. After that, the optimal control strategies are derived by solving the Hamilton–Jacobi–Isaacs (HJI) equation linked to a special nonquadratic cost function. The Nash equilibrium analysis and finite-time capturability are conducted. To learn the optimal pursuit–evasion strategy pair, a fixed-time convergent reinforcement learning (RL) algorithm is proposed, which leverages a novel residual design to facilitate weight updates by collecting and evaluating current and historical data based on information quality. Compared with the existing RL methods that suffer from sluggish convergence due to an asymptotic learning rule and the stringent persistent excitation (PE) condition, the proposed RL relaxes the PE to an easily achievable and online verifiable finite excitation (FE) condition, allowing rapid weight convergence within a fixed period. Simulations and comparisons validate the effectiveness and superiority of the proposed method, showing a 61% reduction in convergence time in contrast to the prevailing RL schemes.
期刊介绍:
The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.