A Weighted Smooth Q-Learning Algorithm

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2025-03-13 DOI:10.1109/LCSYS.2025.3551265

V. Antony Vijesh;S. R. Shreyas

引用次数: 0

Abstract

Q-learning and double Q-learning are well-known sample-based, off-policy reinforcement learning algorithms. However, Q-learning suffers from overestimation bias, while double Q-learning suffers from underestimation bias. To address these issues, this letter proposes a weighted smooth Q-learning (WSQL) algorithm. The proposed algorithm employs a weighted combination of the mellowmax operator and the log-sum-exp operator in place of the maximum operator. Firstly, a new stochastic approximation based result is derived and as a consequence the almost sure convergence of the proposed WSQL is presented. Further, a sufficient condition for the boundedness of WSQL algorithm is obtained. Numerical experiments are conducted on benchmark examples to validate the effectiveness of the proposed weighted smooth Q-learning algorithm.

查看原文本刊更多论文

一种加权平滑q -学习算法

q学习和双q学习是众所周知的基于样本的非策略强化学习算法。然而，q学习存在高估偏差，而双q学习存在低估偏差。为了解决这些问题，本文提出了一种加权平滑q学习（WSQL）算法。该算法采用mellowmax算子和log-sum-exp算子的加权组合来代替最大值算子。首先，导出了一个新的基于随机逼近的结果，从而证明了所提WSQL的收敛性几乎是肯定的。进一步给出了WSQL算法有界性的一个充分条件。在基准算例上进行了数值实验，验证了所提加权平滑q -学习算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471