{"title":"动态环境下的自适应利润分享强化学习方法","authors":"Sadamori Koujaku, Kota Watanabe, H. Igarashi","doi":"10.1109/ICMLA.2011.25","DOIUrl":null,"url":null,"abstract":"In this paper, an Adaptive Forgettable Profit Sharing reinforcement learning method is introduced. This method enables agents to adapt the environmental changes very quickly. It can be used to learn the robust and effective actions in the uncertain environments which have the non-markov property, especially the partial observable markov process (POMDP). Profit Sharing learns rational policy that is easy to be learned and results in good behavior in POMDP. However, the policy becomes worse in the dynamic and huge environment that changes frequently and require the lots of actions to achieve the goal. In order to handle such kind of environment, the forgetting, which gives the adaptability and rationality to Profit Sharing, is implemented. This method allows the agent to forget past experiences that reduce the rationality of its policy. The usefulness of the proposed algorithm is demonstrated through the numerical examples.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Adaptive Profit Sharing Reinforcement Learning Method for Dynamic Environment\",\"authors\":\"Sadamori Koujaku, Kota Watanabe, H. Igarashi\",\"doi\":\"10.1109/ICMLA.2011.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, an Adaptive Forgettable Profit Sharing reinforcement learning method is introduced. This method enables agents to adapt the environmental changes very quickly. It can be used to learn the robust and effective actions in the uncertain environments which have the non-markov property, especially the partial observable markov process (POMDP). Profit Sharing learns rational policy that is easy to be learned and results in good behavior in POMDP. However, the policy becomes worse in the dynamic and huge environment that changes frequently and require the lots of actions to achieve the goal. In order to handle such kind of environment, the forgetting, which gives the adaptability and rationality to Profit Sharing, is implemented. This method allows the agent to forget past experiences that reduce the rationality of its policy. The usefulness of the proposed algorithm is demonstrated through the numerical examples.\",\"PeriodicalId\":439926,\"journal\":{\"name\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"volume\":\"143 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 10th International Conference on Machine Learning and Applications and Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2011.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive Profit Sharing Reinforcement Learning Method for Dynamic Environment
In this paper, an Adaptive Forgettable Profit Sharing reinforcement learning method is introduced. This method enables agents to adapt the environmental changes very quickly. It can be used to learn the robust and effective actions in the uncertain environments which have the non-markov property, especially the partial observable markov process (POMDP). Profit Sharing learns rational policy that is easy to be learned and results in good behavior in POMDP. However, the policy becomes worse in the dynamic and huge environment that changes frequently and require the lots of actions to achieve the goal. In order to handle such kind of environment, the forgetting, which gives the adaptability and rationality to Profit Sharing, is implemented. This method allows the agent to forget past experiences that reduce the rationality of its policy. The usefulness of the proposed algorithm is demonstrated through the numerical examples.