A Deep Reinforcement Learning-Based Approach in Porker Game

電腦學刊 Pub Date : 2023-04-01 DOI:10.53106/199115992023043402004

Yan Kong Yan Kong, Yefeng Rui Yan Kong, Chih-Hsien Hsia Yefeng Rui

引用次数: 0

Abstract

Recent years have witnessed the big success deep reinforcement learning achieved in the domain of card and board games, such as Go, chess and Texas Hold’em poker. However, Dou Di Zhu, a traditional Chinese card game, is still a challenging task for deep reinforcement learning methods due to the enormous action space and the sparse and delayed reward of each action from the environment. Basic reinforcement learning algorithms are more effective in the simple environments which have small action spaces and valuable and concrete reward functions, and unfortunately, are shown not be able to deal with Dou Di Zhu satisfactorily. This work introduces an approach named Two-steps Q-Network based on DQN to playing Dou Di Zhu, which compresses the huge action space through dividing it into two parts according to the rules of Dou Di Zhu and fills in the sparse rewards using inverse reinforcement learning (IRL) through abstracting the reward function from experts’ demonstrations. It is illustrated by the experiments that two-steps Q-network gains great advancements compared with DQN used in Dou Di Zhu.

查看原文本刊更多论文

基于深度强化学习的扑克博弈方法

近年来，深度强化学习在纸牌和棋盘游戏领域取得了巨大的成功，比如围棋、国际象棋和德州扑克。然而，对于中国传统纸牌游戏斗笛竹来说，由于其巨大的动作空间和每个动作来自环境的稀疏和延迟的奖励，对于深度强化学习方法来说仍然是一个具有挑战性的任务。基本的强化学习算法在具有较小的动作空间和有价值且具体的奖励函数的简单环境中更有效，但不幸的是，它不能令人满意地处理豆地珠。本文介绍了一种基于DQN的两步Q-Network来玩斗地球的方法，该方法根据斗地球的规则将巨大的动作空间分成两部分进行压缩，并利用逆强化学习(IRL)从专家的演示中抽象奖励函数来填充稀疏奖励。实验结果表明，两步q -网络与豆地珠中使用的DQN相比，取得了很大的进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

電腦學刊

自引率

0.00%

发文量