Jun Ye, Xiaowei Zhao, Yougang Bian, Manjiang Hu, Hongyang Dong
{"title":"安全约束和干扰下的最优控制:一种多步、非策略自适应动态规划方法。","authors":"Jun Ye, Xiaowei Zhao, Yougang Bian, Manjiang Hu, Hongyang Dong","doi":"10.1007/s11071-025-11329-3","DOIUrl":null,"url":null,"abstract":"<p><p>This paper introduces a multi-step, off-policy adaptive dynamic programming approach, in both model-free and model-based variants, intending to solve optimal control problems under disturbances and safety constraints. To provide a more accurate estimation of the performance function in the policy evaluation step, we employ an interleaved training method in the model-free scheme and utilize a prior model in the model-based version to mitigate the underestimation issue of the accumulated utility function. To further counteract the underestimation of the terminal performance function, dual critic neural networks are utilized. Additionally, to ensure a well-balanced trade-off between safety and performance requirements, the original unconstrained policy improvement process is transformed into a constrained optimization task with a far-sighted safety function. Furthermore, an actor-critic-disturbance framework is designed to handle safety constraints during the zero-sum game process, in which the disturbance policy and the performance function are alternately updated during the PEV step. Based on this, a rigorous theoretical analysis is conducted to evaluate the convergence property of the proposed method. Finally, simulation results and practical experiments demonstrate the effectiveness and safety of the proposed method.</p>","PeriodicalId":19723,"journal":{"name":"Nonlinear Dynamics","volume":"113 17","pages":"22973-22999"},"PeriodicalIF":5.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321936/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimal control under safety constraints and disturbances: a multi-step, off-policy adaptive dynamic programming approach.\",\"authors\":\"Jun Ye, Xiaowei Zhao, Yougang Bian, Manjiang Hu, Hongyang Dong\",\"doi\":\"10.1007/s11071-025-11329-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper introduces a multi-step, off-policy adaptive dynamic programming approach, in both model-free and model-based variants, intending to solve optimal control problems under disturbances and safety constraints. To provide a more accurate estimation of the performance function in the policy evaluation step, we employ an interleaved training method in the model-free scheme and utilize a prior model in the model-based version to mitigate the underestimation issue of the accumulated utility function. To further counteract the underestimation of the terminal performance function, dual critic neural networks are utilized. Additionally, to ensure a well-balanced trade-off between safety and performance requirements, the original unconstrained policy improvement process is transformed into a constrained optimization task with a far-sighted safety function. Furthermore, an actor-critic-disturbance framework is designed to handle safety constraints during the zero-sum game process, in which the disturbance policy and the performance function are alternately updated during the PEV step. Based on this, a rigorous theoretical analysis is conducted to evaluate the convergence property of the proposed method. Finally, simulation results and practical experiments demonstrate the effectiveness and safety of the proposed method.</p>\",\"PeriodicalId\":19723,\"journal\":{\"name\":\"Nonlinear Dynamics\",\"volume\":\"113 17\",\"pages\":\"22973-22999\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321936/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nonlinear Dynamics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11071-025-11329-3\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MECHANICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Dynamics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11071-025-11329-3","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
Optimal control under safety constraints and disturbances: a multi-step, off-policy adaptive dynamic programming approach.
This paper introduces a multi-step, off-policy adaptive dynamic programming approach, in both model-free and model-based variants, intending to solve optimal control problems under disturbances and safety constraints. To provide a more accurate estimation of the performance function in the policy evaluation step, we employ an interleaved training method in the model-free scheme and utilize a prior model in the model-based version to mitigate the underestimation issue of the accumulated utility function. To further counteract the underestimation of the terminal performance function, dual critic neural networks are utilized. Additionally, to ensure a well-balanced trade-off between safety and performance requirements, the original unconstrained policy improvement process is transformed into a constrained optimization task with a far-sighted safety function. Furthermore, an actor-critic-disturbance framework is designed to handle safety constraints during the zero-sum game process, in which the disturbance policy and the performance function are alternately updated during the PEV step. Based on this, a rigorous theoretical analysis is conducted to evaluate the convergence property of the proposed method. Finally, simulation results and practical experiments demonstrate the effectiveness and safety of the proposed method.
期刊介绍:
Nonlinear Dynamics provides a forum for the rapid publication of original research in the field. The journal’s scope encompasses all nonlinear dynamic phenomena associated with mechanical, structural, civil, aeronautical, ocean, electrical, and control systems. Review articles and original contributions are based on analytical, computational, and experimental methods.
The journal examines such topics as perturbation and computational methods, symbolic manipulation, dynamic stability, local and global methods, bifurcations, chaos, and deterministic and random vibrations. The journal also investigates Lie groups, multibody dynamics, robotics, fluid-solid interactions, system modeling and identification, friction and damping models, signal analysis, and measurement techniques.