稀疏奖励问题的伪奖励与动作重要性分类

2022 14th International Conference on Machine Learning and Computing (ICMLC) Pub Date : 2022-02-18 DOI:10.1145/3529836.3529918

Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo

{"title":"稀疏奖励问题的伪奖励与动作重要性分类","authors":"Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo","doi":"10.1145/3529836.3529918","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pseudo Reward and Action Importance Classification for Sparse Reward Problem\",\"authors\":\"Qingtong Wu, Dawei Feng, Yuanzhao Zhai, Bo Ding, Jie Luo\",\"doi\":\"10.1145/3529836.3529918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.\",\"PeriodicalId\":285191,\"journal\":{\"name\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529836.3529918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度强化学习(DRL)在机器人、游戏、自动驾驶汽车等许多领域取得了巨大成功。然而，稀疏奖励问题(状态空间中返回反馈信号的状态数量很少)阻碍了DRL在许多实际任务中的广泛应用。通过精心设计的内在奖励来塑造奖励是一种有效的缓解方法。然而，有用的内在奖励需要丰富的领域知识和广泛的微调，这使得这种方法在许多情况下不可用。为了解决这个问题，我们提出了一个名为PRAIC的框架，该框架仅使用粗略定义的内在奖励。具体来说，PRAIC由伪奖励网络和行动重要性网络组成，前者用于提取与奖励相关的特征，后者用于根据不同场景下行动的重要性对其进行分类。在多智能体粒子环境和谷歌研究足球游戏上的实验证明了该方法的有效性和优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pseudo Reward and Action Importance Classification for Sparse Reward Problem

Deep Reinforcement Learning(DRL) has witnessed great success in many fields like robotics, games, self-driving cars in recent years. However, the sparse reward problem where a meager amount of states in the state space that return a feedback signal hinders the widespread application of DRL in many real-world tasks. Reward shaping with carefully designed intrinsic rewards provides an effective way to relieve it. Nevertheless, useful intrinsic rewards need rich domain knowledge and extensive fine-tuning, which makes this approach unavailable in many cases. To solve this problem, we propose a framework called PRAIC which only utilizes roughly defined intrinsic rewards. Specifically, the PRAIC consists of a pseudo reward network to extract reward-related features and an action importance network to classify actions according to their importance in different scenarios. Experiments on the multi-agent particle environment and Google Research Football game demonstrate the effectiveness and superior performance of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 14th International Conference on Machine Learning and Computing (ICMLC)

自引率

0.00%

发文量