醋酸乙烯单体工厂模型控制的析因核动态策略规划

Yunduan Cui, Lingwei Zhu, Morihiro Fujisaki, H. Kanokogi, Takamitsu Matsubara
{"title":"醋酸乙烯单体工厂模型控制的析因核动态策略规划","authors":"Yunduan Cui, Lingwei Zhu, Morihiro Fujisaki, H. Kanokogi, Takamitsu Matsubara","doi":"10.1109/COASE.2018.8560593","DOIUrl":null,"url":null,"abstract":"This research focuses on applying reinforcement learning towards chemical plant control problems in order to optimize production while maintaining plant stability without requiring knowledge of the plant models. Since a typical chemical plant has a large number of sensors and actuators, the control problem of such a plant can be formulated as a Markov decision process involving high-dimensional state and a huge number of actions that might be difficult to solve by previous methods due to computational complexity and sample insufficiency. To overcome these issues, we propose a new reinforcement learning method, Factorial Kernel Dynamic Policy Programming, that employs 1) a factorial policy model and 2) a factor-wise kernel-based smooth policy update by regularization with the Kullback-Leibler divergence between the current and updated policies. To validate its effectiveness, FKDPP is evaluated via the Vinyl Acetate Monomer plant (VAM) model, a popular benchmark chemical plant control problem. Compared with previous methods that cannot directly process a huge number of actions, our proposed method leverages the same number of training samples and achieves a better control strategy for VAM yield, quality, and plant stability.","PeriodicalId":6518,"journal":{"name":"2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)","volume":"59 1","pages":"304-309"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Factorial Kernel Dynamic Policy Programming for Vinyl Acetate Monomer Plant Model Control\",\"authors\":\"Yunduan Cui, Lingwei Zhu, Morihiro Fujisaki, H. Kanokogi, Takamitsu Matsubara\",\"doi\":\"10.1109/COASE.2018.8560593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research focuses on applying reinforcement learning towards chemical plant control problems in order to optimize production while maintaining plant stability without requiring knowledge of the plant models. Since a typical chemical plant has a large number of sensors and actuators, the control problem of such a plant can be formulated as a Markov decision process involving high-dimensional state and a huge number of actions that might be difficult to solve by previous methods due to computational complexity and sample insufficiency. To overcome these issues, we propose a new reinforcement learning method, Factorial Kernel Dynamic Policy Programming, that employs 1) a factorial policy model and 2) a factor-wise kernel-based smooth policy update by regularization with the Kullback-Leibler divergence between the current and updated policies. To validate its effectiveness, FKDPP is evaluated via the Vinyl Acetate Monomer plant (VAM) model, a popular benchmark chemical plant control problem. Compared with previous methods that cannot directly process a huge number of actions, our proposed method leverages the same number of training samples and achieves a better control strategy for VAM yield, quality, and plant stability.\",\"PeriodicalId\":6518,\"journal\":{\"name\":\"2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)\",\"volume\":\"59 1\",\"pages\":\"304-309\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COASE.2018.8560593\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COASE.2018.8560593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

本研究的重点是将强化学习应用于化工厂控制问题,以便在不需要了解工厂模型的情况下,在保持工厂稳定性的同时优化生产。由于典型的化工厂具有大量的传感器和执行器,因此化工厂的控制问题可以表示为一个涉及高维状态和大量动作的马尔可夫决策过程,而以往的方法由于计算复杂性和样本不足而难以解决。为了克服这些问题,我们提出了一种新的强化学习方法,析因核动态策略规划,它采用1)析因策略模型和2)基于析因核的平滑策略更新,通过正则化当前和更新策略之间的Kullback-Leibler散度。为了验证其有效性,通过醋酸乙烯单体工厂(VAM)模型(一个流行的基准化工厂控制问题)对FKDPP进行了评估。与以往不能直接处理大量动作的方法相比,我们提出的方法利用相同数量的训练样本,实现了更好的VAM产量、质量和工厂稳定性控制策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Factorial Kernel Dynamic Policy Programming for Vinyl Acetate Monomer Plant Model Control
This research focuses on applying reinforcement learning towards chemical plant control problems in order to optimize production while maintaining plant stability without requiring knowledge of the plant models. Since a typical chemical plant has a large number of sensors and actuators, the control problem of such a plant can be formulated as a Markov decision process involving high-dimensional state and a huge number of actions that might be difficult to solve by previous methods due to computational complexity and sample insufficiency. To overcome these issues, we propose a new reinforcement learning method, Factorial Kernel Dynamic Policy Programming, that employs 1) a factorial policy model and 2) a factor-wise kernel-based smooth policy update by regularization with the Kullback-Leibler divergence between the current and updated policies. To validate its effectiveness, FKDPP is evaluated via the Vinyl Acetate Monomer plant (VAM) model, a popular benchmark chemical plant control problem. Compared with previous methods that cannot directly process a huge number of actions, our proposed method leverages the same number of training samples and achieves a better control strategy for VAM yield, quality, and plant stability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信