Briti Gangopadhyay;Zhao Wang;Jia-Fong Yeh;Shingo Takamatsu
{"title":"Enhancing Generalization of Offline RL in Data-Limited Settings With Heuristic Rules","authors":"Briti Gangopadhyay;Zhao Wang;Jia-Fong Yeh;Shingo Takamatsu","doi":"10.1109/TAI.2025.3544971","DOIUrl":null,"url":null,"abstract":"With the ability to learn from static datasets, OFFLINE reinforcement learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform suboptimally when confronted with limited data confined to specific regions within the state space. Performance degradation is attributed to the inability of offline RL algorithms to learn appropriate actions for rare or unseen observations. This article proposes a heuristic rule-based regularization technique and adaptively refines the initial knowledge from heuristics to considerably boost performance in limited data with partially omitted states. The key insight is that the regularization term mitigates erroneous actions for sparse samples and unobserved states covered by domain knowledge. Empirical evaluations on standard offline RL datasets demonstrate a substantial average performance increase compared to ensemble of domain knowledge and existing offline RL algorithms operating on limited data.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2291-2301"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10902185/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the ability to learn from static datasets, OFFLINE reinforcement learning (RL) emerges as a compelling avenue for real-world applications. However, state-of-the-art offline RL algorithms perform suboptimally when confronted with limited data confined to specific regions within the state space. Performance degradation is attributed to the inability of offline RL algorithms to learn appropriate actions for rare or unseen observations. This article proposes a heuristic rule-based regularization technique and adaptively refines the initial knowledge from heuristics to considerably boost performance in limited data with partially omitted states. The key insight is that the regularization term mitigates erroneous actions for sparse samples and unobserved states covered by domain knowledge. Empirical evaluations on standard offline RL datasets demonstrate a substantial average performance increase compared to ensemble of domain knowledge and existing offline RL algorithms operating on limited data.