{"title":"深度学习正则化在离线 RL 中对行动者的作用","authors":"Denis Tarasov, Anja Surina, Caglar Gulcehre","doi":"arxiv-2409.07606","DOIUrl":null,"url":null,"abstract":"Deep learning regularization techniques, such as \\emph{dropout}, \\emph{layer\nnormalization}, or \\emph{weight decay}, are widely adopted in the construction\nof modern artificial neural networks, often resulting in more robust training\nprocesses and improved generalization capabilities. However, in the domain of\n\\emph{Reinforcement Learning} (RL), the application of these techniques has\nbeen limited, usually applied to value function estimators\n\\citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental\neffects. This issue is even more pronounced in offline RL settings, which bear\ngreater similarity to supervised learning but have received less attention.\nRecent work in continuous offline RL has demonstrated that while we can build\nsufficiently powerful critic networks, the generalization of actor networks\nremains a bottleneck. In this study, we empirically show that applying standard\nregularization techniques to actor networks in offline RL actor-critic\nalgorithms yields improvements of 6\\% on average across two algorithms and\nthree different continuous D4RL domains.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Role of Deep Learning Regularizations on Actors in Offline RL\",\"authors\":\"Denis Tarasov, Anja Surina, Caglar Gulcehre\",\"doi\":\"arxiv-2409.07606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning regularization techniques, such as \\\\emph{dropout}, \\\\emph{layer\\nnormalization}, or \\\\emph{weight decay}, are widely adopted in the construction\\nof modern artificial neural networks, often resulting in more robust training\\nprocesses and improved generalization capabilities. However, in the domain of\\n\\\\emph{Reinforcement Learning} (RL), the application of these techniques has\\nbeen limited, usually applied to value function estimators\\n\\\\citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental\\neffects. This issue is even more pronounced in offline RL settings, which bear\\ngreater similarity to supervised learning but have received less attention.\\nRecent work in continuous offline RL has demonstrated that while we can build\\nsufficiently powerful critic networks, the generalization of actor networks\\nremains a bottleneck. In this study, we empirically show that applying standard\\nregularization techniques to actor networks in offline RL actor-critic\\nalgorithms yields improvements of 6\\\\% on average across two algorithms and\\nthree different continuous D4RL domains.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Role of Deep Learning Regularizations on Actors in Offline RL
Deep learning regularization techniques, such as \emph{dropout}, \emph{layer
normalization}, or \emph{weight decay}, are widely adopted in the construction
of modern artificial neural networks, often resulting in more robust training
processes and improved generalization capabilities. However, in the domain of
\emph{Reinforcement Learning} (RL), the application of these techniques has
been limited, usually applied to value function estimators
\citep{hiraoka2021dropout, smith2022walk}, and may result in detrimental
effects. This issue is even more pronounced in offline RL settings, which bear
greater similarity to supervised learning but have received less attention.
Recent work in continuous offline RL has demonstrated that while we can build
sufficiently powerful critic networks, the generalization of actor networks
remains a bottleneck. In this study, we empirically show that applying standard
regularization techniques to actor networks in offline RL actor-critic
algorithms yields improvements of 6\% on average across two algorithms and
three different continuous D4RL domains.