稀疏奖励问题的伪奖励与动作重要性分类

Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo
{"title":"稀疏奖励问题的伪奖励与动作重要性分类","authors":"Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo","doi":"10.1145/3529836.3529918","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pseudo Reward and Action Importance Classification for Sparse Reward Problem\",\"authors\":\"Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo\",\"doi\":\"10.1145/3529836.3529918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.\",\"PeriodicalId\":285191,\"journal\":{\"name\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529836.3529918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,深度强化学习(DRL)在机器人、游戏、自动驾驶汽车等许多领域取得了巨大成功。然而,稀疏奖励问题(状态空间中返回反馈信号的状态数量很少)阻碍了DRL在许多实际任务中的广泛应用。通过精心设计的内在奖励来塑造奖励是一种有效的缓解方法。然而,有用的内在奖励需要丰富的领域知识和广泛的微调,这使得这种方法在许多情况下不可用。为了解决这个问题,我们提出了一个名为PRAIC的框架,该框架仅使用粗略定义的内在奖励。具体来说,PRAIC由伪奖励网络和行动重要性网络组成,前者用于提取与奖励相关的特征,后者用于根据不同场景下行动的重要性对其进行分类。在多智能体粒子环境和谷歌研究足球游戏上的实验证明了该方法的有效性和优越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pseudo Reward and Action Importance Classification for Sparse Reward Problem
Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信