Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen
{"title":"TLDR","authors":"Abdulrahman Alabduljabbar, Ahmed A. Abusnaina, Ülkü Meteriz-Yildiran, David A. Mohaisen","doi":"10.1145/3463676.3485608","DOIUrl":null,"url":null,"abstract":"[1] Agent image from Wikimedia Commons [2] Henderson et al. \"Deep reinforcement learning that matters\". 2018 [3] Tucker et al. \"The Mirage of Action-Dependent Baselines in Reinforcement Learning\". 2018 [4] Shimon et al. \"Protecting against evaluation overfitting in empirical reinforcement learning.\" 2011. [5] Bellemare et al. \"The arcade learning environment: An evaluation platform for general agents.\" 2013 [6] Riedmiller et al. \"Evaluation of policy gradient methods and variants on the cart-pole benchmark.\" 2007. [7] Zhang et al. \"A Study on Overfitting in Deep Reinforcement Learning.\" 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details","PeriodicalId":205601,"journal":{"name":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th Workshop on Workshop on Privacy in the Electronic Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3463676.3485608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
[1] Agent image from Wikimedia Commons [2] Henderson et al. "Deep reinforcement learning that matters". 2018 [3] Tucker et al. "The Mirage of Action-Dependent Baselines in Reinforcement Learning". 2018 [4] Shimon et al. "Protecting against evaluation overfitting in empirical reinforcement learning." 2011. [5] Bellemare et al. "The arcade learning environment: An evaluation platform for general agents." 2013 [6] Riedmiller et al. "Evaluation of policy gradient methods and variants on the cart-pole benchmark." 2007. [7] Zhang et al. "A Study on Overfitting in Deep Reinforcement Learning." 2018 Score / Discounted Return / Reward Inconsistent measures of performance between results. Sample Efficiency Sample efficiency is not a good measure of how good an algorithm performs unless training conditions are constant. Top Seeds / Best Seeds Only reporting the best seeds found can skew results in your favour. [4] Stochasticity of policy Explicitly stating if the policy used was stochastic or not. Environment start states Some labs may not have access to the conditions of the environment that make evaluations unfair. Evaluation Details Training Details