用于经济非线性模型预测控制的库普曼模型端到端强化学习

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2024-08-05 DOI:10.1016/j.compchemeng.2024.108824

Daniel Mayfrank , Alexander Mitsos , Manuel Dahmen

{"title":"用于经济非线性模型预测控制的库普曼模型端到端强化学习","authors":"Daniel Mayfrank , Alexander Mitsos , Manuel Dahmen","doi":"10.1016/j.compchemeng.2024.108824","DOIUrl":null,"url":null,"abstract":"<div><p>(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.</p></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"190 ","pages":"Article 108824"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0098135424002424/pdfft?md5=b8942e7813913b046ed7ab32d3f23e7e&pid=1-s2.0-S0098135424002424-main.pdf","citationCount":"0","resultStr":"{\"title\":\"End-to-end reinforcement learning of Koopman models for economic nonlinear model predictive control\",\"authors\":\"Daniel Mayfrank , Alexander Mitsos , Manuel Dahmen\",\"doi\":\"10.1016/j.compchemeng.2024.108824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.</p></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"190 \",\"pages\":\"Article 108824\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0098135424002424/pdfft?md5=b8942e7813913b046ed7ab32d3f23e7e&pid=1-s2.0-S0098135424002424-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135424002424\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424002424","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

(经济）非线性模型预测控制（(e)NMPC）要求动态模型足够精确且计算简单。机理模型的数据驱动代用模型可以减轻（e）NMPC 的计算负担；然而，这些模型通常是通过系统识别来训练的，目的是在模拟样本上获得最大预测精度，在（e）NMPC 中的表现并不理想。我们提出了一种端到端强化学习 Koopman 代理模型的方法，使其作为 (e)NMPC 的一部分发挥最佳性能。我们将这一方法应用于从已建立的非线性连续搅拌罐反应器模型中衍生出来的两个应用中。我们将控制器性能与利用系统识别训练模型的（e）NMPC 和利用强化学习训练的无模型神经网络控制器进行了比较。结果表明，端到端训练模型优于使用系统识别训练的（e）NMPC 模型，而且与神经网络控制器相比，（e）NMPC 控制器无需重新训练即可对控制设置的变化做出反应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

End-to-end reinforcement learning of Koopman models for economic nonlinear model predictive control

(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.