Toward General Function Approximation in Nonstationary Reinforcement Learning

IF 2.2

IEEE journal on selected areas in information theory Pub Date : 2024-03-29 DOI:10.1109/JSAIT.2024.3381818

Songtao Feng;Ming Yin;Ruiquan Huang;Yu-Xiang Wang;Jing Yang;Yingbin Liang

{"title":"Toward General Function Approximation in Nonstationary Reinforcement Learning","authors":"Songtao Feng;Ming Yin;Ruiquan Huang;Yu-Xiang Wang;Jing Yang;Yingbin Liang","doi":"10.1109/JSAIT.2024.3381818","DOIUrl":null,"url":null,"abstract":"Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"190-206"},"PeriodicalIF":2.2000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10485378/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.

查看原文本刊更多论文

在非稳态强化学习中实现通用函数逼近

函数逼近在强化学习（RL）领域取得了巨大成功。尽管在结构假设下的非稳态函数逼近 RL 理论发展方面取得了一些进展，但针对一般函数逼近的非稳态 RL 的现有研究仍然有限。在这项工作中，我们研究了两种不同的非稳态 RL 方法：基于置信集的算法和 UCB 型算法。对于第一种方法，我们为非稳态 MDPs 引入了一种新的复杂度度量--动态 Bellman Eluder（DBE），然后基于该复杂度度量提出了一种基于置信集的算法 SW-OPEA。SW-OPEA 具有滑动窗口机制和针对非稳态 MDP 的新型置信集设计。对于第二种方法，我们按照流行的最小平方值迭代（LSVI）框架提出了一种 UCB 型算法 LSVI-Nonstationary，并缓解了基于置信集方法的计算效率挑战。LSVI-Nonstationary 具有重启机制和处理非平稳性的奖励项新设计。在小变化预算设置中，针对非平稳线性和表格 MDP，这两种拟议算法的性能优于现有算法。据我们所知，这两种方法是第一种基于置信集的非平稳 MDP 算法和 UCB 型算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE journal on selected areas in information theory

CiteScore

8.20

自引率

0.00%

发文量