{"title":"Robust Reinforcement Learning with Dynamic Distortion Risk Measures","authors":"Anthony Coache, Sebastian Jaimungal","doi":"arxiv-2409.10096","DOIUrl":null,"url":null,"abstract":"In a reinforcement learning (RL) setting, the agent's optimal strategy\nheavily depends on her risk preferences and the underlying model dynamics of\nthe training environment. These two aspects influence the agent's ability to\nmake well-informed and time-consistent decisions when facing testing\nenvironments. In this work, we devise a framework to solve robust risk-aware RL\nproblems where we simultaneously account for environmental uncertainty and risk\nwith a class of dynamic robust distortion risk measures. Robustness is\nintroduced by considering all models within a Wasserstein ball around a\nreference model. We estimate such dynamic robust risk measures using neural\nnetworks by making use of strictly consistent scoring functions, derive policy\ngradient formulae using the quantile representation of distortion risk\nmeasures, and construct an actor-critic algorithm to solve this class of robust\nrisk-aware RL problems. We demonstrate the performance of our algorithm on a\nportfolio allocation example.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In a reinforcement learning (RL) setting, the agent's optimal strategy
heavily depends on her risk preferences and the underlying model dynamics of
the training environment. These two aspects influence the agent's ability to
make well-informed and time-consistent decisions when facing testing
environments. In this work, we devise a framework to solve robust risk-aware RL
problems where we simultaneously account for environmental uncertainty and risk
with a class of dynamic robust distortion risk measures. Robustness is
introduced by considering all models within a Wasserstein ball around a
reference model. We estimate such dynamic robust risk measures using neural
networks by making use of strictly consistent scoring functions, derive policy
gradient formulae using the quantile representation of distortion risk
measures, and construct an actor-critic algorithm to solve this class of robust
risk-aware RL problems. We demonstrate the performance of our algorithm on a
portfolio allocation example.