A Model-Based Exploration Policy in Deep Q-Network

Shuailong Li, Wei Zhang, Yuquan Leng, Xin Zhang
{"title":"A Model-Based Exploration Policy in Deep Q-Network","authors":"Shuailong Li, Wei Zhang, Yuquan Leng, Xin Zhang","doi":"10.1109/dsins54396.2021.9670573","DOIUrl":null,"url":null,"abstract":"Reinforcement learning has successfully been used in many applications and achieved prodigious performance (such as video games), and DQN is a well-known algorithm in RL. However, there are some disadvantages in practical applications, and the exploration and exploitation dilemma is one of them. To solve this problem, common strategies about exploration like ɛ–greedy have risen. Unfortunately, there are sample inefficient and ineffective because of the uncertainty of later exploration. In this paper, we propose a model-based exploration method that learns the state transition model to explore. Using the training rules of machine learning, we can train the state transition model networks to improve exploration efficiency and sample efficiency. We compare our algorithm with ɛ–greedy on the Deep Q-Networks (DQN) algorithm and apply it to the Atari 2600 games. Our algorithm outperforms the decaying ɛ–greedy strategy when we evaluate our algorithm across 14 Atari games in the Arcade Learning Environment (ALE).","PeriodicalId":243724,"journal":{"name":"2021 International Conference on Digital Society and Intelligent Systems (DSInS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Society and Intelligent Systems (DSInS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dsins54396.2021.9670573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Reinforcement learning has successfully been used in many applications and achieved prodigious performance (such as video games), and DQN is a well-known algorithm in RL. However, there are some disadvantages in practical applications, and the exploration and exploitation dilemma is one of them. To solve this problem, common strategies about exploration like ɛ–greedy have risen. Unfortunately, there are sample inefficient and ineffective because of the uncertainty of later exploration. In this paper, we propose a model-based exploration method that learns the state transition model to explore. Using the training rules of machine learning, we can train the state transition model networks to improve exploration efficiency and sample efficiency. We compare our algorithm with ɛ–greedy on the Deep Q-Networks (DQN) algorithm and apply it to the Atari 2600 games. Our algorithm outperforms the decaying ɛ–greedy strategy when we evaluate our algorithm across 14 Atari games in the Arcade Learning Environment (ALE).
一种基于模型的深度q网络探索策略
强化学习已经成功地应用于许多应用中,并取得了惊人的性能(例如视频游戏),DQN是强化学习中众所周知的算法。但在实际应用中也存在一些弊端,勘探开发困境就是其中之一。为了解决这个问题,出现了一些常见的探索策略,比如_ -greedy。不幸的是,由于后期勘探的不确定性,存在样本效率低下和无效的情况。本文提出了一种基于模型的探索方法,通过学习状态转移模型进行探索。利用机器学习的训练规则,我们可以训练状态转移模型网络,以提高勘探效率和样本效率。我们将我们的算法与Deep Q-Networks (DQN)算法上的_ -greedy算法进行比较,并将其应用于Atari 2600游戏。当我们在街机学习环境(ALE)中评估我们的算法时,我们的算法优于衰减的贪心策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信