{"title":"Moor: Model-based offline policy optimization with a risk dynamics model","authors":"Xiaolong Su, Peng Li, Shaofei Chen","doi":"10.1007/s40747-024-01621-x","DOIUrl":null,"url":null,"abstract":"<p>Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aversion. However, current model-based approaches only extract state transition information and reward information using dynamics models, which cannot capture risk information implicit in offline data and may result in the misuse of high-risk data. In this work, we propose a model-based offline policy optimization approach with a risk dynamics model (MOOR). Specifically, we construct a risk dynamics model using a quantile network that can learn the risk information of data, then we reshape model-generated data based on errors of the risk dynamics model and the risk information of data. Finally, we use a risk-averse algorithm to learn the policy on the combined dataset of offline and generated data. We theoretically prove that MOOR can identify risk information of data and avoid utilizing high-risk data, our experiments show that MOOR outperforms existing approaches and achieves state-of-the-art results in risk-sensitive D4RL and risky navigation tasks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"71 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01621-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aversion. However, current model-based approaches only extract state transition information and reward information using dynamics models, which cannot capture risk information implicit in offline data and may result in the misuse of high-risk data. In this work, we propose a model-based offline policy optimization approach with a risk dynamics model (MOOR). Specifically, we construct a risk dynamics model using a quantile network that can learn the risk information of data, then we reshape model-generated data based on errors of the risk dynamics model and the risk information of data. Finally, we use a risk-averse algorithm to learn the policy on the combined dataset of offline and generated data. We theoretically prove that MOOR can identify risk information of data and avoid utilizing high-risk data, our experiments show that MOOR outperforms existing approaches and achieves state-of-the-art results in risk-sensitive D4RL and risky navigation tasks.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.