Inhibiting Error Exacerbation in Offline Reinforcement Learning With Data Sparsity.

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fan Zhang,Malu Zhang,Wenyu Chen,Siying Wang,Xin Zhang,Jiayin Li,Yang Yang
{"title":"Inhibiting Error Exacerbation in Offline Reinforcement Learning With Data Sparsity.","authors":"Fan Zhang,Malu Zhang,Wenyu Chen,Siying Wang,Xin Zhang,Jiayin Li,Yang Yang","doi":"10.1109/tnnls.2025.3615982","DOIUrl":null,"url":null,"abstract":"Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state-actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state-actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS's superior performance.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"158 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3615982","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state-actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state-actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS's superior performance.
利用数据稀疏性抑制离线强化学习中的错误加剧。
离线强化学习(Offline reinforcement learning, RL)旨在从先前收集的数据集中学习有效的智能体,避免实时交互,提高RL的安全性和效率。然而,在实际应用中,离分布状态动作的近似误差会在训练过程中由于误差加剧而导致相当大的高估,最终降低性能。与之前仅仅解决OOD状态行为的工作相反,我们发现所有数据都会引入估计误差,其大小与数据稀疏度直接相关。因此,在抑制误差加剧时,数据稀疏性的影响是不可避免的,也是至关重要的。在本文中,我们提出了一种基于数据稀疏性(IEEDS)的离线强化学习方法来抑制错误加剧,其中包括一种新的值估计方法来考虑数据稀疏性对智能体训练的影响。具体而言,值估计阶段包括两个创新:1)用V-net取代Q-net,更小更密集的状态空间使数据更集中,有助于更准确的值估计;2)通过设计状态感知-稀疏马尔可夫决策过程(MDP)将状态稀疏性引入训练,进一步减少稀疏状态的影响。从理论上证明了状态感知稀疏MDP下IEEDS的收敛性。在离线RL基准测试上进行的大量实验表明,IEEDS具有优越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信