国家新颖性抽样探索

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS) Pub Date : 2022-11-26 DOI:10.1109/ccis57298.2022.10016361

Shaojie Li, Xiangfeng Luo, Zhenyu Zhang, Hang Yu, Shaorong Xie

{"title":"国家新颖性抽样探索","authors":"Shaojie Li, Xiangfeng Luo, Zhenyu Zhang, Hang Yu, Shaorong Xie","doi":"10.1109/ccis57298.2022.10016361","DOIUrl":null,"url":null,"abstract":"Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are summed directly as intrinsic rewards and extrinsic rewards. However intrinsic rewards are non-stationary, which directly contaminates extrinsic environmental rewards and changes the optimization objective of the policy to maximize the sum of intrinsic and extrinsic rewards. This could lead the agent to a mixture policy that neither conducts exploration nor task score fulfillment resolutely. This adopts a simple and generic perspective, where we explicitly disentangle extrinsic reward and intrinsic reward. Through the multiple sampling mechanism, our method, State Novelty Sampling Exploration (SNSE), cleverly decouples the intrinsic and extrinsic rewards, so that the two can take their respective roles. Letting intrinsic rewards directly guide the agent to explore novel samples during the exploration phase, and that our policy optimization goal is still to maximize extrinsic rewards. In sparse rewards environments, our experiments show that SNSE can improve the efficiency of exploring unknown states and improve the final performance of the policy. Under dense rewards, SNSE do not make the policy produce optimization bias and cause performance loss.","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SNSE: State Novelty Sampling Exploration\",\"authors\":\"Shaojie Li, Xiangfeng Luo, Zhenyu Zhang, Hang Yu, Shaorong Xie\",\"doi\":\"10.1109/ccis57298.2022.10016361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are summed directly as intrinsic rewards and extrinsic rewards. However intrinsic rewards are non-stationary, which directly contaminates extrinsic environmental rewards and changes the optimization objective of the policy to maximize the sum of intrinsic and extrinsic rewards. This could lead the agent to a mixture policy that neither conducts exploration nor task score fulfillment resolutely. This adopts a simple and generic perspective, where we explicitly disentangle extrinsic reward and intrinsic reward. Through the multiple sampling mechanism, our method, State Novelty Sampling Exploration (SNSE), cleverly decouples the intrinsic and extrinsic rewards, so that the two can take their respective roles. Letting intrinsic rewards directly guide the agent to explore novel samples during the exploration phase, and that our policy optimization goal is still to maximize extrinsic rewards. In sparse rewards environments, our experiments show that SNSE can improve the efficiency of exploring unknown states and improve the final performance of the policy. Under dense rewards, SNSE do not make the policy produce optimization bias and cause performance loss.\",\"PeriodicalId\":374660,\"journal\":{\"name\":\"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ccis57298.2022.10016361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ccis57298.2022.10016361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

探索稀疏奖励强化学习仍然是一个开放的挑战。许多最先进的方法使用内在动机来补充稀疏的外在奖励信号，使智能体在探索过程中有更多的机会获得反馈。通常，这些信号被直接概括为内在奖励和外在奖励。然而，内部奖励是非平稳的，这直接污染了外部环境奖励，改变了政策的优化目标，使内外奖励之和最大化。这可能导致agent进入既不进行探索也不坚决完成任务分数的混合策略。这采用了一种简单而通用的观点，即我们明确区分了外在奖励和内在奖励。通过多重抽样机制，我们的方法——状态新颖性抽样探索(SNSE)，巧妙地将内在奖励和外在奖励解耦，使两者发挥各自的作用。让内在奖励直接引导agent在探索阶段探索新样本，我们的策略优化目标仍然是最大化外在奖励。在稀疏奖励环境中，我们的实验表明，SNSE可以提高探索未知状态的效率，提高策略的最终性能。在密集奖励下，SNSE不会使策略产生优化偏差而导致性能损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SNSE: State Novelty Sampling Exploration

Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are summed directly as intrinsic rewards and extrinsic rewards. However intrinsic rewards are non-stationary, which directly contaminates extrinsic environmental rewards and changes the optimization objective of the policy to maximize the sum of intrinsic and extrinsic rewards. This could lead the agent to a mixture policy that neither conducts exploration nor task score fulfillment resolutely. This adopts a simple and generic perspective, where we explicitly disentangle extrinsic reward and intrinsic reward. Through the multiple sampling mechanism, our method, State Novelty Sampling Exploration (SNSE), cleverly decouples the intrinsic and extrinsic rewards, so that the two can take their respective roles. Letting intrinsic rewards directly guide the agent to explore novel samples during the exploration phase, and that our policy optimization goal is still to maximize extrinsic rewards. In sparse rewards environments, our experiments show that SNSE can improve the efficiency of exploring unknown states and improve the final performance of the policy. Under dense rewards, SNSE do not make the policy produce optimization bias and cause performance loss.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)

自引率

0.00%

发文量