TLDR

Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society Pub Date : 2021-11-15 DOI:10.1145/3463676.3485608

Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen

{"title":"TLDR","authors":"Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen","doi":"10.1145/3463676.3485608","DOIUrl":null,"url":null,"abstract":"[1] Agent image from Wikimedia Commons [2] Henderson et al. \"Deep reinforcement learning that matters\". 2018 [3] Tucker et al. \"The Mirage of Action-Dependent Baselines in Reinforcement Learning\". 2018 [4] Shimon et al. \"Protecting against evaluation overfitting in empirical reinforcement learning.\" 2011. [5] Bellemare et al. \"The arcade learning environment: An evaluation platform for general agents.\" 2013 [6] Riedmiller et al. \"Evaluation of policy gradient methods and variants on the cart-pole benchmark.\" 2007. [7] Zhang et al. \"A Study on Overfitting in Deep Reinforcement Learning.\" 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details","PeriodicalId":205601,"journal":{"name":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463676.3485608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

[1] Agent image from Wikimedia Commons [2] Henderson et al. "Deep reinforcement learning that matters". 2018 [3] Tucker et al. "The Mirage of Action-Dependent Baselines in Reinforcement Learning". 2018 [4] Shimon et al. "Protecting against evaluation overfitting in empirical reinforcement learning." 2011. [5] Bellemare et al. "The arcade learning environment: An evaluation platform for general agents." 2013 [6] Riedmiller et al. "Evaluation of policy gradient methods and variants on the cart-pole benchmark." 2007. [7] Zhang et al. "A Study on Overfitting in Deep Reinforcement Learning." 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details

查看原文本刊更多论文

TLDR

[1]张海涛，张海涛，等。“深度强化学习很重要”。2018[3]刘建军，刘建军等。“强化学习中行动依赖基线的海市蜃楼”。2018[4]王晓明等。“在经验强化学习中防止评估过拟合。”2011. [5]张晓明，张晓明。"街机学习环境:总代理的评估平台"[6]李晓明等。“在车杆基准上评估政策梯度方法和变体。”2007. [7]张等。“深度强化学习中的过拟合研究”2018年得分/折扣回报/奖励结果之间的绩效衡量标准不一致。除非训练条件是恒定的，否则样本效率并不能很好地衡量算法的性能。只报告发现的最好的种子可能会使结果偏向你。[4]策略的随机性明确说明所使用的策略是否是随机的。环境启动状态一些实验室可能无法接触到使评估不公平的环境条件。培训详情

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society

自引率

0.00%

发文量