Fan Zhang,Malu Zhang,Wenyu Chen,Siying Wang,Xin Zhang,Jiayin Li,Yang Yang
{"title":"利用数据稀疏性抑制离线强化学习中的错误加剧。","authors":"Fan Zhang,Malu Zhang,Wenyu Chen,Siying Wang,Xin Zhang,Jiayin Li,Yang Yang","doi":"10.1109/tnnls.2025.3615982","DOIUrl":null,"url":null,"abstract":"Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state-actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state-actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS's superior performance.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"158 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inhibiting Error Exacerbation in Offline Reinforcement Learning With Data Sparsity.\",\"authors\":\"Fan Zhang,Malu Zhang,Wenyu Chen,Siying Wang,Xin Zhang,Jiayin Li,Yang Yang\",\"doi\":\"10.1109/tnnls.2025.3615982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state-actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state-actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS's superior performance.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"158 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3615982\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3615982","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Inhibiting Error Exacerbation in Offline Reinforcement Learning With Data Sparsity.
Offline reinforcement learning (RL) aims to learn effective agents from previously collected datasets, facilitating the safety and efficiency of RL by avoiding real-time interaction. However, in practical applications, the approximation error of the out-of-distribution (OOD) state-actions can cause considerable overestimation due to error exacerbation during training, finally degrading the performance. In contrast to prior works that merely addressed the OOD state-actions, we discover that all data introduces estimation error whose magnitude is directly related to data sparsity. Consequently, the impact of data sparsity is inevitable and vital when inhibiting the error exacerbation. In this article, we propose an offline RL approach to inhibit error exacerbation with data sparsity (IEEDS), which includes a novel value estimation method to consider the impact of data sparsity on the training of agents. Specifically, the value estimation phase includes two innovations: 1) replace Q-net with V-net, a smaller and denser state space makes data more concentrated, contributing to more accurate value estimation and 2) introduce state sparsity to the training by design state-aware-sparsity Markov decision process (MDP), further lessening the impact of sparse states. We theoretically prove the convergence of IEEDS under state-aware-sparsity MDP. Extensive experiments on offline RL benchmarks reveal that IEEDS's superior performance.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.