{"title":"强化学习中有效探索的自适应探索网络策略","authors":"Min Li, William Zhu","doi":"10.1117/12.2667206","DOIUrl":null,"url":null,"abstract":"How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive exploration network policy for effective exploration in reinforcement learning\",\"authors\":\"Min Li, William Zhu\",\"doi\":\"10.1117/12.2667206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive exploration network policy for effective exploration in reinforcement learning
How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.