双DQN收敛性与奖励稀疏性的实证分析

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2022-12-01 DOI:10.1109/ICMLA55696.2022.00102

Samuel Blad, Martin Längkvist, Franziska Klügl-Frohnmeyer, A. Loutfi

{"title":"双DQN收敛性与奖励稀疏性的实证分析","authors":"Samuel Blad, Martin Längkvist, Franziska Klügl-Frohnmeyer, A. Loutfi","doi":"10.1109/ICMLA55696.2022.00102","DOIUrl":null,"url":null,"abstract":"Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical analysis of the convergence of Double DQN in relation to reward sparsity\",\"authors\":\"Samuel Blad, Martin Längkvist, Franziska Klügl-Frohnmeyer, A. Loutfi\",\"doi\":\"10.1109/ICMLA55696.2022.00102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

Q-Networks在强化学习中用于模拟给定状态下每个动作的预期回报。当训练q网络时，外部奖励信号被传播到导致每个奖励的先前执行的动作。如果在体验奖励之前需要许多行动，那么奖励信号就会分布在所有这些行动中，其中一些行动可能比其他行动对奖励的影响更大。当奖励之间的重要行动数量增加时，每个行动的相对重要性就会降低。如果动作的重要性太小，它们的影响可能会被深度神经网络模型中的噪声所掩盖，从而可能导致收敛问题。在这项工作中，我们通过经验测试了在简单的网格世界环境中增加导致奖励的行动数量的限制。我们在实验中表明，即使训练误差超过了归因于每个动作的奖励信号，模型仍然能够学习到足够平滑的值表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Empirical analysis of the convergence of Double DQN in relation to reward sparsity

Q-Networks are used in Reinforcement Learning to model the expected return from every action at a given state. When training Q-Networks, external reward signals are propagated to the previously performed actions leading up to each reward. If many actions are required before experiencing a reward, the reward signal is distributed across all those actions, where some actions may have greater impact on the reward than others. As the number of significant actions between rewards increases, the relative importance of each action decreases. If actions have too small importance, their impact might be overshadowed by noise in a deep neural network model, potentially causing convergence issues. In this work, we empirically test the limits of increasing the number of actions leading up to a reward in a simple grid-world environment. We show in our experiments that even though the training error surpasses the reward signal attributed to each action, the model is still able to learn a smooth enough value representation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量