Optimizing Weights to Fit Parametric Operation Policies for Generalized Working Conditions in Linear Systems Using Deep Reinforcement Learning

IF 9.9 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Industrial Informatics Pub Date : 2025-01-10 DOI:10.1109/TII.2024.3523563

Ruiyu Qiu;Guanghui Yang;Zuhua Xu;Zhijiang Shao

{"title":"Optimizing Weights to Fit Parametric Operation Policies for Generalized Working Conditions in Linear Systems Using Deep Reinforcement Learning","authors":"Ruiyu Qiu;Guanghui Yang;Zuhua Xu;Zhijiang Shao","doi":"10.1109/TII.2024.3523563","DOIUrl":null,"url":null,"abstract":"At present, working conditions are becoming more complex, and operation policy requirements are more diverse in process system engineering. To control a process problem, a balance must be found between speed and stability, in that operations should sometimes be faster and other times smoother. Traditional controllers, such as PID and model predictive control are applied in various problems, and some parameters in controllers can be used to represent the operation policy. However, there can be difficulties in tuning parameters, and time costs of online calculation. This article proposes parametric deep reinforcement learning (PDRL) to replace traditional controllers. PDRL has two parts. A vanilla DRL framework is adapted to solve the setpoint tracking problem. With a state and a reward function and robust training tricks, trained agents can be applied to more generalized working conditions. Base agents of different operation policies are trained in advance. With target performance from operators, the target policy can be fitted by base agents with a set of weights, which are first optimized by minimizing the squared error between the target and fitted policy in a basic task, and applied to generalized conditions. A shell benchmark problem is chosen as a case study, whose results show that PDRL has feasibility and stability both in basic and generalized tasks, even in a noisy environment.","PeriodicalId":13301,"journal":{"name":"IEEE Transactions on Industrial Informatics","volume":"21 4","pages":"3186-3195"},"PeriodicalIF":9.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Informatics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836915/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

At present, working conditions are becoming more complex, and operation policy requirements are more diverse in process system engineering. To control a process problem, a balance must be found between speed and stability, in that operations should sometimes be faster and other times smoother. Traditional controllers, such as PID and model predictive control are applied in various problems, and some parameters in controllers can be used to represent the operation policy. However, there can be difficulties in tuning parameters, and time costs of online calculation. This article proposes parametric deep reinforcement learning (PDRL) to replace traditional controllers. PDRL has two parts. A vanilla DRL framework is adapted to solve the setpoint tracking problem. With a state and a reward function and robust training tricks, trained agents can be applied to more generalized working conditions. Base agents of different operation policies are trained in advance. With target performance from operators, the target policy can be fitted by base agents with a set of weights, which are first optimized by minimizing the squared error between the target and fitted policy in a basic task, and applied to generalized conditions. A shell benchmark problem is chosen as a case study, whose results show that PDRL has feasibility and stability both in basic and generalized tasks, even in a noisy environment.

查看原文本刊更多论文

基于深度强化学习的线性系统广义工况参数操作策略的权值优化

当前，工艺系统工程的工况日趋复杂，操作政策要求也更加多样化。为了控制过程问题，必须在速度和稳定性之间找到平衡，因为操作有时应该更快，有时应该更平滑。传统的PID、模型预测控制等控制器应用于各种问题，控制器中的一些参数可以用来表示运行策略。然而，在调优参数和在线计算的时间成本方面可能存在困难。本文提出了参数深度强化学习（PDRL）来取代传统的控制器。PDRL有两个部分。采用一个普通的DRL框架来解决设定值跟踪问题。有了状态和奖励函数以及稳健的训练技巧，训练好的智能体可以应用于更广义的工作条件。对不同操作策略的基地代理进行提前培训。基于算子的目标性能，基本智能体可以用一组权重来拟合目标策略，这些权重首先通过最小化基本任务中目标与拟合策略之间的平方误差来优化，然后应用于广义条件。以一个shell基准问题为例，结果表明PDRL在基本任务和广义任务中都具有可行性和稳定性，即使在噪声环境下也是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Industrial Informatics 工程技术-工程：工业

CiteScore

24.10

自引率

8.90%

发文量

1202

审稿时长

5.1 months

期刊介绍： The IEEE Transactions on Industrial Informatics is a multidisciplinary journal dedicated to publishing technical papers that connect theory with practical applications of informatics in industrial settings. It focuses on the utilization of information in intelligent, distributed, and agile industrial automation and control systems. The scope includes topics such as knowledge-based and AI-enhanced automation, intelligent computer control systems, flexible and collaborative manufacturing, industrial informatics in software-defined vehicles and robotics, computer vision, industrial cyber-physical and industrial IoT systems, real-time and networked embedded systems, security in industrial processes, industrial communications, systems interoperability, and human-machine interaction.