TLDR

Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen
{"title":"TLDR","authors":"Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen","doi":"10.1145/3463676.3485608","DOIUrl":null,"url":null,"abstract":"[1] Agent image from Wikimedia Commons [2] Henderson et al. \"Deep reinforcement learning that matters\". 2018 [3] Tucker et al. \"The Mirage of Action-Dependent Baselines in Reinforcement Learning\". 2018 [4] Shimon et al. \"Protecting against evaluation overfitting in empirical reinforcement learning.\" 2011. [5] Bellemare et al. \"The arcade learning environment: An evaluation platform for general agents.\" 2013 [6] Riedmiller et al. \"Evaluation of policy gradient methods and variants on the cart-pole benchmark.\" 2007. [7] Zhang et al. \"A Study on Overfitting in Deep Reinforcement Learning.\" 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details","PeriodicalId":205601,"journal":{"name":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463676.3485608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

[1] Agent image from Wikimedia Commons [2] Henderson et al. "Deep reinforcement learning that matters". 2018 [3] Tucker et al. "The Mirage of Action-Dependent Baselines in Reinforcement Learning". 2018 [4] Shimon et al. "Protecting against evaluation overfitting in empirical reinforcement learning." 2011. [5] Bellemare et al. "The arcade learning environment: An evaluation platform for general agents." 2013 [6] Riedmiller et al. "Evaluation of policy gradient methods and variants on the cart-pole benchmark." 2007. [7] Zhang et al. "A Study on Overfitting in Deep Reinforcement Learning." 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details
TLDR
[1]张海涛,张海涛,等。“深度强化学习很重要”。2018[3]刘建军,刘建军等。“强化学习中行动依赖基线的海市蜃楼”。2018[4]王晓明等。“在经验强化学习中防止评估过拟合。”2011. [5]张晓明,张晓明。"街机学习环境:总代理的评估平台"[6]李晓明等。“在车杆基准上评估政策梯度方法和变体。”2007. [7]张等。“深度强化学习中的过拟合研究”2018年得分/折扣回报/奖励结果之间的绩效衡量标准不一致。除非训练条件是恒定的,否则样本效率并不能很好地衡量算法的性能。只报告发现的最好的种子可能会使结果偏向你。[4]策略的随机性明确说明所使用的策略是否是随机的。环境启动状态一些实验室可能无法接触到使评估不公平的环境条件。培训详情
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信