{"title":"通过解耦网络学习快速掌握亚马逊游戏","authors":"G. Q. Zhang, Xiaoyang Chen, Ruidong Chang, Yuhang Zhang, Cong Wang, Luyi Bai, Junwei Wang, Changming Xu","doi":"10.1109/IJCNN52387.2021.9534274","DOIUrl":null,"url":null,"abstract":"In this work, we propose a deep reinforcement learning (DRL) algorithm DoubleJump which can master the game of Amazons efficiently. To address the bottleneck problem of sparse supervision signal in DRL, we split the neural network into rule network and skill network, using huge amounts of inexpensive data with game rule information and scarce data containing game skill information to train two networks respectively. Besides, we split the three sub-actions of each action into independent states during Monte-Carlo tree search (MCTS), to improve the probability of finding the global optimal state and reduce the average branching factor. The experimental results show our algorithm reaches about 130:70 in the zero-knowledge learning compared with the AlphaGo Zero algorithm, significantly improves the learning speed, and then alleviates the severe dependence on computing resources.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mastering the Game of Amazons Fast by Decoupling Network Learning\",\"authors\":\"G. Q. Zhang, Xiaoyang Chen, Ruidong Chang, Yuhang Zhang, Cong Wang, Luyi Bai, Junwei Wang, Changming Xu\",\"doi\":\"10.1109/IJCNN52387.2021.9534274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we propose a deep reinforcement learning (DRL) algorithm DoubleJump which can master the game of Amazons efficiently. To address the bottleneck problem of sparse supervision signal in DRL, we split the neural network into rule network and skill network, using huge amounts of inexpensive data with game rule information and scarce data containing game skill information to train two networks respectively. Besides, we split the three sub-actions of each action into independent states during Monte-Carlo tree search (MCTS), to improve the probability of finding the global optimal state and reduce the average branching factor. The experimental results show our algorithm reaches about 130:70 in the zero-knowledge learning compared with the AlphaGo Zero algorithm, significantly improves the learning speed, and then alleviates the severe dependence on computing resources.\",\"PeriodicalId\":396583,\"journal\":{\"name\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"163 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN52387.2021.9534274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9534274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
在这项工作中,我们提出了一种深度强化学习(DRL)算法DoubleJump,它可以有效地掌握亚马逊的游戏。为了解决DRL中监督信号稀疏的瓶颈问题,我们将神经网络分为规则网络和技能网络,分别使用大量含有游戏规则信息的廉价数据和含有游戏技能信息的稀缺数据来训练两个网络。此外,在蒙特卡罗树搜索(Monte-Carlo tree search, MCTS)中,我们将每个动作的三个子动作拆分为独立的状态,提高了找到全局最优状态的概率,降低了平均分支因子。实验结果表明,与AlphaGo Zero算法相比,我们的算法在零知识学习方面达到130:70左右,显著提高了学习速度,从而缓解了对计算资源的严重依赖。
Mastering the Game of Amazons Fast by Decoupling Network Learning
In this work, we propose a deep reinforcement learning (DRL) algorithm DoubleJump which can master the game of Amazons efficiently. To address the bottleneck problem of sparse supervision signal in DRL, we split the neural network into rule network and skill network, using huge amounts of inexpensive data with game rule information and scarce data containing game skill information to train two networks respectively. Besides, we split the three sub-actions of each action into independent states during Monte-Carlo tree search (MCTS), to improve the probability of finding the global optimal state and reduce the average branching factor. The experimental results show our algorithm reaches about 130:70 in the zero-knowledge learning compared with the AlphaGo Zero algorithm, significantly improves the learning speed, and then alleviates the severe dependence on computing resources.