无监督机器学习方法在训练和在线实施强化学习中的行为-批评结构中的应用

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2025-09-11 DOI:10.1016/j.compchemeng.2025.109392

Daniel Beahr, Debangsu Bhattacharyya

{"title":"无监督机器学习方法在训练和在线实施强化学习中的行为-批评结构中的应用","authors":"Daniel Beahr, Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109392","DOIUrl":null,"url":null,"abstract":"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109392"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation\",\"authors\":\"Daniel Beahr, Debangsu Bhattacharyya\",\"doi\":\"10.1016/j.compchemeng.2025.109392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"204 \",\"pages\":\"Article 109392\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425003953\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003953","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在连续系统中实施强化学习（RL）的一个根本障碍是，为了实现令人满意的控制策略，必须进行大量数据和训练。当重点是在线实现时，这种情况会加剧。这项工作的目标是研究使用无监督学习来利用可用数据做出更有效的决策，无论是在典型的强化学习算法中进行学习还是探索。高斯混合模型（gmm）用于在探索过程中形成对提议动作结果的概率预测，而高性能数据点随后被过度采样以加速学习和收敛。就勘探策略而言，GMMs用于预测给定操作的结果，并用于防止可能导致控制性能重大损失和/或违反安全或其他操作约束的意外勘探操作。所提出的方法集成在深度确定性策略梯度算法中，并应用于选择性催化还原装置的控制。结果发现，与标准RL方法相比，找到了一个令人满意的策略，速度更快，总体性能下降更少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation

A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.