{"title":"无监督机器学习方法在训练和在线实施强化学习中的行为-批评结构中的应用","authors":"Daniel Beahr, Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109392","DOIUrl":null,"url":null,"abstract":"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109392"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation\",\"authors\":\"Daniel Beahr, Debangsu Bhattacharyya\",\"doi\":\"10.1016/j.compchemeng.2025.109392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"204 \",\"pages\":\"Article 109392\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425003953\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003953","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Application of unsupervised machine learning methods to actor–critic structures in reinforcement learning for training and online implementation
A fundamental obstacle to the implementation of reinforcement learning (RL) to continuous systems is the large amount of data and training that must take place to achieve a satisfactory control policy. This is exacerbated when the focus is an online implementation. It is the goal of this work to investigate the use of unsupervised learning to make more efficient decisions with the data available, both for learning and exploration in the typical RL algorithm. Gaussian mixture models (GMMs) are used to form a probabilistic prediction for the outcome of proposed actions during exploration, while high-performing data points are subsequently over-sampled to accelerate learning and convergence. With respect to the exploration policy, GMMs are used to predict the outcomes for given actions and used for preventing undesired exploratory actions that can lead to significant loss in control performance and/or violation of safety or other operational constraints. The proposed approach was integrated within a Deep Deterministic Policy Gradient algorithm and was applied to the control of a selective catalytic reduction unit. It was found that a satisfactory policy was found faster and with less overall performance degradation than the standard RL approaches.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.