一种加权平滑q -学习算法

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2025-03-13 DOI:10.1109/LCSYS.2025.3551265

V. Antony Vijesh;S. R. Shreyas

{"title":"一种加权平滑q -学习算法","authors":"V. Antony Vijesh;S. R. Shreyas","doi":"10.1109/LCSYS.2025.3551265","DOIUrl":null,"url":null,"abstract":"Q-learning and double Q-learning are well-known sample-based, off-policy reinforcement learning algorithms. However, Q-learning suffers from overestimation bias, while double Q-learning suffers from underestimation bias. To address these issues, this letter proposes a weighted smooth Q-learning (WSQL) algorithm. The proposed algorithm employs a weighted combination of the mellowmax operator and the log-sum-exp operator in place of the maximum operator. Firstly, a new stochastic approximation based result is derived and as a consequence the almost sure convergence of the proposed WSQL is presented. Further, a sufficient condition for the boundedness of WSQL algorithm is obtained. Numerical experiments are conducted on benchmark examples to validate the effectiveness of the proposed weighted smooth Q-learning algorithm.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"21-26"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Weighted Smooth Q-Learning Algorithm\",\"authors\":\"V. Antony Vijesh;S. R. Shreyas\",\"doi\":\"10.1109/LCSYS.2025.3551265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Q-learning and double Q-learning are well-known sample-based, off-policy reinforcement learning algorithms. However, Q-learning suffers from overestimation bias, while double Q-learning suffers from underestimation bias. To address these issues, this letter proposes a weighted smooth Q-learning (WSQL) algorithm. The proposed algorithm employs a weighted combination of the mellowmax operator and the log-sum-exp operator in place of the maximum operator. Firstly, a new stochastic approximation based result is derived and as a consequence the almost sure convergence of the proposed WSQL is presented. Further, a sufficient condition for the boundedness of WSQL algorithm is obtained. Numerical experiments are conducted on benchmark examples to validate the effectiveness of the proposed weighted smooth Q-learning algorithm.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":\"9 \",\"pages\":\"21-26\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10925426/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10925426/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

q学习和双q学习是众所周知的基于样本的非策略强化学习算法。然而，q学习存在高估偏差，而双q学习存在低估偏差。为了解决这些问题，本文提出了一种加权平滑q学习（WSQL）算法。该算法采用mellowmax算子和log-sum-exp算子的加权组合来代替最大值算子。首先，导出了一个新的基于随机逼近的结果，从而证明了所提WSQL的收敛性几乎是肯定的。进一步给出了WSQL算法有界性的一个充分条件。在基准算例上进行了数值实验，验证了所提加权平滑q -学习算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Weighted Smooth Q-Learning Algorithm

Q-learning and double Q-learning are well-known sample-based, off-policy reinforcement learning algorithms. However, Q-learning suffers from overestimation bias, while double Q-learning suffers from underestimation bias. To address these issues, this letter proposes a weighted smooth Q-learning (WSQL) algorithm. The proposed algorithm employs a weighted combination of the mellowmax operator and the log-sum-exp operator in place of the maximum operator. Firstly, a new stochastic approximation based result is derived and as a consequence the almost sure convergence of the proposed WSQL is presented. Further, a sufficient condition for the boundedness of WSQL algorithm is obtained. Numerical experiments are conducted on benchmark examples to validate the effectiveness of the proposed weighted smooth Q-learning algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471