Yang Yu, Zhixiong Gan, Chun Xing Li, Hui Luo, Jiashou Wang
{"title":"基于actor - critical算法的非平稳环境连续自适应","authors":"Yang Yu, Zhixiong Gan, Chun Xing Li, Hui Luo, Jiashou Wang","doi":"10.1109/ISPACS57703.2022.10082809","DOIUrl":null,"url":null,"abstract":"In reinforcement learning, the training process for the agent is highly relevant to the dynamics, Agent's dynamics are generally considered to be parts of environments. When dynamics changed, the previous learning model may be unable to adapt to the new environment. In this paper, we propose a simple adaptive method based on the traditional actor-critic framework. A new component named Adaptor is added to the original model. The kernel of the Adaptor is a network which has the same structure as the Critic. The component can adaptively adjust the Actor's actions. Experiments show the agents pre-trained in different environments including Gym and MuJoCo achieve better performances in the tasks of adapting to the new dynamics-changed environments than the original methods. Moreover, the proposed method shows superior performance over the baseline method just learning form the scratch in some original tasks.","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Continuous Adaptation in Nonstationary Environments Based on Actor-Critic Algorithm\",\"authors\":\"Yang Yu, Zhixiong Gan, Chun Xing Li, Hui Luo, Jiashou Wang\",\"doi\":\"10.1109/ISPACS57703.2022.10082809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In reinforcement learning, the training process for the agent is highly relevant to the dynamics, Agent's dynamics are generally considered to be parts of environments. When dynamics changed, the previous learning model may be unable to adapt to the new environment. In this paper, we propose a simple adaptive method based on the traditional actor-critic framework. A new component named Adaptor is added to the original model. The kernel of the Adaptor is a network which has the same structure as the Critic. The component can adaptively adjust the Actor's actions. Experiments show the agents pre-trained in different environments including Gym and MuJoCo achieve better performances in the tasks of adapting to the new dynamics-changed environments than the original methods. Moreover, the proposed method shows superior performance over the baseline method just learning form the scratch in some original tasks.\",\"PeriodicalId\":410603,\"journal\":{\"name\":\"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPACS57703.2022.10082809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Continuous Adaptation in Nonstationary Environments Based on Actor-Critic Algorithm
In reinforcement learning, the training process for the agent is highly relevant to the dynamics, Agent's dynamics are generally considered to be parts of environments. When dynamics changed, the previous learning model may be unable to adapt to the new environment. In this paper, we propose a simple adaptive method based on the traditional actor-critic framework. A new component named Adaptor is added to the original model. The kernel of the Adaptor is a network which has the same structure as the Critic. The component can adaptively adjust the Actor's actions. Experiments show the agents pre-trained in different environments including Gym and MuJoCo achieve better performances in the tasks of adapting to the new dynamics-changed environments than the original methods. Moreover, the proposed method shows superior performance over the baseline method just learning form the scratch in some original tasks.