Concertorl：有限时间单寿命增强控制的强化学习方法及其在直接驱动串联翼实验平台上的应用

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2024-10-28 DOI:10.1007/s10489-024-05720-7

Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang

{"title":"Concertorl：有限时间单寿命增强控制的强化学习方法及其在直接驱动串联翼实验平台上的应用","authors":"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang","doi":"10.1007/s10489-024-05720-7","DOIUrl":null,"url":null,"abstract":"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13121 - 13159"},"PeriodicalIF":3.4000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms\",\"authors\":\"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang\",\"doi\":\"10.1007/s10489-024-05720-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"54 24\",\"pages\":\"13121 - 13159\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-05720-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05720-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

使用有限时间单寿命方法实现机械系统的控制，对现有控制算法的安全性和效率提出了巨大挑战。为了解决这些问题，我们引入了 ConcertoRL 算法，该算法有两个主要创新点：一个是基于 Lipschitz 条件的时间交错机制，它将经典控制器与基于强化学习的控制器整合在一起，以提高单寿命条件下初始阶段的安全性；另一个是基于有限时间 Lyapunov 收敛条件的策略构成器，它可以组织过去的学习经验，以确保在有限时间限制内提高效率。实验在直驱串联翼实验平台上进行，这是一个在非线性非稳态负载条件下运行的典型机械系统。首先，与软行为批判（SAC）算法、近端策略优化（PPO）算法和孪生延迟深度确定性策略梯度（TD3）算法等成熟算法相比，ConcertoRL 在有限时间单寿命条件下的前 500 步内表现出近一个数量级的性能优势。其次，时间交错机制的消融实验表明，引入该模块后，单次生命最后平均奖励的性能提高了近两个数量级。此外，与没有强化学习增强功能的情况相比，集成该模块后的性能大幅提升了约 60%，与以加倍控制频率运行的参考控制器相比，效率提高了 30%。这些结果凸显了该算法产生超过各部分总和的协同效应的能力。第三，对基于规则的策略构成器的消融研究进一步验证了其对提高 ConcertoRL 收敛速度的显著影响。最后，ConcertoRL 框架的通用性实验证明了它与各种经典控制器的兼容性，并不断取得优异的控制结果。ConcertoRL 为非线性、非稳定负载条件下的机械系统提供了一种有前途的方法。它可以即插即用，在有限时间和单寿命限制下实现高效控制。这项工作为直驱平台在串联机翼影响下面临的挑战设定了新的控制效果基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

查看原文本刊更多论文

Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.

Graphical abstract

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.