{"title":"Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning","authors":"N. Nafi, Creighton Glasscock, W. Hsu","doi":"10.1109/ICMLA55696.2022.00011","DOIUrl":null,"url":null,"abstract":"In this work, we introduce Attention-based Partially Decoupled Actor-Critic (APDAC), an actor-critic architecture for generalization in reinforcement learning, which partially separates the policy and the value functions. To learn directly from images, traditional actor-critic architectures use a shared network to represent the policy and value functions. While a shared representation allows parameter and feature sharing, it can also lead to overfitting that catastrophically damages generalization performance. On the other hand, two separate networks for policy and value can help to avoid overfitting and reduce the generalization gap, but at the cost of added complexity both in terms of architecture design and computation time. APDAC is a hybrid architecture that builds upon the combined strengths of both architectures by sharing initial layer blocks of the network and separating the later ones for policy and value. APDAC incorporates an attention mechanism to enable robust representation learning. We present meaningful visualization of the policy and value that explains the perception of the trained agent. Our empirical analysis, including an ablation study, shows that APDAC significantly outperforms the standard PPO baseline on the challenging RL generalization benchmark Procgen and achieves performance that is competitive with the recent state-of-the-art method (IDAAC) while using fewer convolutional layers and requiring less computational time. Our code is available at https://github.com/nasiknafi/apdac.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this work, we introduce Attention-based Partially Decoupled Actor-Critic (APDAC), an actor-critic architecture for generalization in reinforcement learning, which partially separates the policy and the value functions. To learn directly from images, traditional actor-critic architectures use a shared network to represent the policy and value functions. While a shared representation allows parameter and feature sharing, it can also lead to overfitting that catastrophically damages generalization performance. On the other hand, two separate networks for policy and value can help to avoid overfitting and reduce the generalization gap, but at the cost of added complexity both in terms of architecture design and computation time. APDAC is a hybrid architecture that builds upon the combined strengths of both architectures by sharing initial layer blocks of the network and separating the later ones for policy and value. APDAC incorporates an attention mechanism to enable robust representation learning. We present meaningful visualization of the policy and value that explains the perception of the trained agent. Our empirical analysis, including an ablation study, shows that APDAC significantly outperforms the standard PPO baseline on the challenging RL generalization benchmark Procgen and achieves performance that is competitive with the recent state-of-the-art method (IDAAC) while using fewer convolutional layers and requiring less computational time. Our code is available at https://github.com/nasiknafi/apdac.