{"title":"深度q -网络的自适应安德森加速","authors":"Melike Ermis, Insoon Yang","doi":"10.1109/SSCI47803.2020.9308288","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has been used for an agent to learn efficient decision-making strategies through its interactions with an environment. However, slow convergence and sample inefficiency of RL algorithms make them impractical for complex real-world problems. In this paper, we present an acceleration scheme, called Anderson acceleration (AA), for RL, where the value function in the next iteration is calculated using a linear combination of value functions in the previous iterations. Since the original AA method suffers from instability, we consider adaptive Anderson acceleration (A3) as a stabilized variant of AA, which contains both adaptive regularization to handle instability and safeguarding to enhance performance. We first apply A3 to value iteration for Q-functions and show its convergence property. To extend the idea of A3 to model-free deep RL, we devise a simple variant of deep Q-networks (DQN). Our experiments on the Atari 2600 benchmark demonstrate that the proposed method outperforms double DQN in terms of both final performance and learning speed.","PeriodicalId":413489,"journal":{"name":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A3DQN: Adaptive Anderson Acceleration for Deep Q-Networks\",\"authors\":\"Melike Ermis, Insoon Yang\",\"doi\":\"10.1109/SSCI47803.2020.9308288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) has been used for an agent to learn efficient decision-making strategies through its interactions with an environment. However, slow convergence and sample inefficiency of RL algorithms make them impractical for complex real-world problems. In this paper, we present an acceleration scheme, called Anderson acceleration (AA), for RL, where the value function in the next iteration is calculated using a linear combination of value functions in the previous iterations. Since the original AA method suffers from instability, we consider adaptive Anderson acceleration (A3) as a stabilized variant of AA, which contains both adaptive regularization to handle instability and safeguarding to enhance performance. We first apply A3 to value iteration for Q-functions and show its convergence property. To extend the idea of A3 to model-free deep RL, we devise a simple variant of deep Q-networks (DQN). Our experiments on the Atari 2600 benchmark demonstrate that the proposed method outperforms double DQN in terms of both final performance and learning speed.\",\"PeriodicalId\":413489,\"journal\":{\"name\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"221 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI47803.2020.9308288\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI47803.2020.9308288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A3DQN: Adaptive Anderson Acceleration for Deep Q-Networks
Reinforcement learning (RL) has been used for an agent to learn efficient decision-making strategies through its interactions with an environment. However, slow convergence and sample inefficiency of RL algorithms make them impractical for complex real-world problems. In this paper, we present an acceleration scheme, called Anderson acceleration (AA), for RL, where the value function in the next iteration is calculated using a linear combination of value functions in the previous iterations. Since the original AA method suffers from instability, we consider adaptive Anderson acceleration (A3) as a stabilized variant of AA, which contains both adaptive regularization to handle instability and safeguarding to enhance performance. We first apply A3 to value iteration for Q-functions and show its convergence property. To extend the idea of A3 to model-free deep RL, we devise a simple variant of deep Q-networks (DQN). Our experiments on the Atari 2600 benchmark demonstrate that the proposed method outperforms double DQN in terms of both final performance and learning speed.