一种改进的批处理强化学习控制策略

2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR) Pub Date : 2019-08-01 DOI:10.1109/MMAR.2019.8864632

P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu

{"title":"一种改进的批处理强化学习控制策略","authors":"P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu","doi":"10.1109/MMAR.2019.8864632","DOIUrl":null,"url":null,"abstract":"Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.","PeriodicalId":392498,"journal":{"name":"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An improved reinforcement learning control strategy for batch processes\",\"authors\":\"P. Zhang, Jie Zhang, Yang Long, Bingzhang Hu\",\"doi\":\"10.1109/MMAR.2019.8864632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.\",\"PeriodicalId\":392498,\"journal\":{\"name\":\"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMAR.2019.8864632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMAR.2019.8864632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

批量生产过程是高附加值产品敏捷生产的重要而必要的生产路线，但由于存在未知干扰、模型厂失配和高度非线性等特点，批量生产过程难以控制。传统的一步强化学习和神经网络已被用于批量过程的优化和控制。然而，传统的一步强化学习和神经网络缺乏准确性和鲁棒性，导致性能不理想。为了克服这些问题和困难，本文提出了一种基于多步动作q学习的改进多步动作q学习算法(MMSA)。对于MSA，将动作空间划分为若干相同时间步长的时间段，并在一个时间段内连续应用固定的贪婪策略来探索相同的动作。与MSA相比，MMSA的改进之处在于，在整个系统时间内，动作的探索和选择将遵循一种改进的、多样化的贪婪策略，从而提高了学习算法的灵活性和速度。将该算法应用于一个高度非线性的批处理过程，结果表明，该算法比传统的一步强化学习和MSA具有更好的控制性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An improved reinforcement learning control strategy for batch processes

Batch processes are significant and essential manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control because of unknown disturbances, model plant mismatches, and highly nonlinear characteristic. Traditional one-step reinforcement learning and neural network have been applied to optimize and control batch processes. However, traditional one-step reinforcement learning and the neural network lack accuracy and robustness leading to unsatisfactory performance. To overcome these issues and difficulties, a modified multi-step action Q-learning algorithm (MMSA) based on multiple step action Q-learning (MSA) is proposed in this paper. For MSA, the action space is divided into some periods of same time steps and the same action is explored with fixed greedy policy being applied continuously during a period. Compared with MSA, the modification of MMSA is that the exploration and selection of action will follow an improved and various greedy policy in the whole system time which can improve the flexibility and speed of the learning algorithm. The proposed algorithm is applied to a highly nonlinear batch process and it is shown giving better control performance than the traditional one-step reinforcement learning and MSA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR)

自引率

0.00%

发文量