A deep reinforcement learning-based controller design framework for Lipschitz continuous nonlinear systems

IF 6.8 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-06-27 DOI:10.1016/j.ins.2025.122455

Yuan Li , Siyang Zhao , Jinyong Yu

{"title":"A deep reinforcement learning-based controller design framework for Lipschitz continuous nonlinear systems","authors":"Yuan Li , Siyang Zhao , Jinyong Yu","doi":"10.1016/j.ins.2025.122455","DOIUrl":null,"url":null,"abstract":"<div><div>Due to the complex dynamics and uncertainty of the nonlinear systems, designing controllers for such systems poses significant challenges. To address this dilemma, deep reinforcement learning (DRL) indicates a promising method. However, most designs of reward/value functions in DRL rely on experience, which takes much trial and error. In order to decrease the trial cost, this paper proposes a novel DRL method founded on actor-critic (AC) architecture for nonlinear system controller design, which is called actor-Lyapunov (AL). Diverging from conventional AC architecture, AL eliminates the necessity of the critic network. The actor network can be trained by utilizing a kind of Lyapunov function as the value function. Firstly, we provide a perspective of normed linear space to clarify the controller design. The controller generated by the actor network is regarded as a proper mapping within the state space. Based on this concept, the convergence of this approach under gradient descent is briefly analyzed. Next, a refined value function related to the exponent is introduced to promote the training effect of the actor network. Finally, simulations are conducted to validate the efficacy of our approach and illustrate the advantages of the refined value function in improving system performance.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122455"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005870","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the complex dynamics and uncertainty of the nonlinear systems, designing controllers for such systems poses significant challenges. To address this dilemma, deep reinforcement learning (DRL) indicates a promising method. However, most designs of reward/value functions in DRL rely on experience, which takes much trial and error. In order to decrease the trial cost, this paper proposes a novel DRL method founded on actor-critic (AC) architecture for nonlinear system controller design, which is called actor-Lyapunov (AL). Diverging from conventional AC architecture, AL eliminates the necessity of the critic network. The actor network can be trained by utilizing a kind of Lyapunov function as the value function. Firstly, we provide a perspective of normed linear space to clarify the controller design. The controller generated by the actor network is regarded as a proper mapping within the state space. Based on this concept, the convergence of this approach under gradient descent is briefly analyzed. Next, a refined value function related to the exponent is introduced to promote the training effect of the actor network. Finally, simulations are conducted to validate the efficacy of our approach and illustrate the advantages of the refined value function in improving system performance.

查看原文本刊更多论文

基于深度强化学习的Lipschitz连续非线性系统控制器设计框架

由于非线性系统具有复杂的动力学特性和不确定性，对此类系统的控制器设计提出了重大挑战。为了解决这一困境，深度强化学习（DRL）是一种很有前途的方法。然而，DRL中的大多数奖励/价值功能设计依赖于经验，这需要大量的试验和错误。为了降低试验成本，本文提出了一种基于actor-critic （AC）体系结构的非线性系统控制器DRL设计方法——actor-Lyapunov （AL）。与传统的交流架构不同，人工智能消除了评论家网络的必要性。行动者网络可以用一种李雅普诺夫函数作为值函数来训练。首先，我们提供了一个归范线性空间的视角来阐明控制器的设计。将参与者网络生成的控制器视为状态空间内的适当映射。在此基础上，简要分析了该方法在梯度下降条件下的收敛性。接下来，引入一个与指数相关的精炼值函数来提升行动者网络的训练效果。最后，通过仿真验证了该方法的有效性，并说明了改进的价值函数在提高系统性能方面的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.