无监督机器学习方法在训练和在线实施强化学习中的行为-批评结构中的应用

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Daniel Beahr, Debangsu Bhattacharyya
{"title":"无监督机器学习方法在训练和在线实施强化学习中的行为-批评结构中的应用","authors":"Daniel Beahr,&nbsp;Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109392","DOIUrl":null,"url":null,"abstract":"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109392"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation\",\"authors\":\"Daniel Beahr,&nbsp;Debangsu Bhattacharyya\",\"doi\":\"10.1016/j.compchemeng.2025.109392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"204 \",\"pages\":\"Article 109392\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425003953\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003953","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

在连续系统中实施强化学习(RL)的一个根本障碍是,为了实现令人满意的控制策略,必须进行大量数据和训练。当重点是在线实现时,这种情况会加剧。这项工作的目标是研究使用无监督学习来利用可用数据做出更有效的决策,无论是在典型的强化学习算法中进行学习还是探索。高斯混合模型(gmm)用于在探索过程中形成对提议动作结果的概率预测,而高性能数据点随后被过度采样以加速学习和收敛。就勘探策略而言,GMMs用于预测给定操作的结果,并用于防止可能导致控制性能重大损失和/或违反安全或其他操作约束的意外勘探操作。所提出的方法集成在深度确定性策略梯度算法中,并应用于选择性催化还原装置的控制。结果发现,与标准RL方法相比,找到了一个令人满意的策略,速度更快,总体性能下降更少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation
A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信