基于最大加线性逼近的拟合q -迭代

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2024-12-18 DOI:10.1109/LCSYS.2024.3520060

Yichen Liu;Mohamad Amin Sharifi Kolarijani

{"title":"基于最大加线性逼近的拟合q -迭代","authors":"Yichen Liu;Mohamad Amin Sharifi Kolarijani","doi":"10.1109/LCSYS.2024.3520060","DOIUrl":null,"url":null,"abstract":"In this letter, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"3201-3206"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fitted Q-Iteration via Max-Plus-Linear Approximation\",\"authors\":\"Yichen Liu;Mohamad Amin Sharifi Kolarijani\",\"doi\":\"10.1109/LCSYS.2024.3520060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this letter, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":\"8 \",\"pages\":\"3201-3206\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10806735/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10806735/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在这封信中，我们考虑了q函数的最大加线性逼近器在贴现马尔可夫决策过程的离线强化学习中的应用。特别地，我们结合这些近似来提出新颖的具有可证明收敛性的拟合q迭代（FQI）算法。利用Bellman算子与max-plus操作的兼容性，我们证明了所提出的FQI算法的每次迭代中的max-plus线性回归可以简化为简单的max-plus矩阵向量乘法。我们还考虑了所提出算法的变分实现，这导致了独立于样本数量的每次迭代复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fitted Q-Iteration via Max-Plus-Linear Approximation

In this letter, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471