Rohit Sachin Sadavarte, Rishab Raj, B. Sathish Babu
{"title":"Solving the Lunar Lander Problem using Reinforcement Learning","authors":"Rohit Sachin Sadavarte, Rishab Raj, B. Sathish Babu","doi":"10.1109/CSITSS54238.2021.9682970","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning is an area of machine learning concerned with enabling an agent to solve a problem with feedback with the end goal to maximize some form of cumulative long-term reward. In this paper, two different Reinforcement Learning techniques from the value-based technique and policy gradient based method headers are implemented and analyzed. The algorithms chosen under these headers are Deep Q Learning and Policy Gradient respectively. The environment in which the comparison is done is OpenAI Gym’s LunarLander environment. A comparative analysis of the two techniques is then performed in order to understand the differences in a deterministic episodic state space. Both of these algorithms are model free, that is, they can be applied irrespective of the environment and do not need to have any knowledge about the exact details of the environment itself, hence the comparison can be extended to any other environment that shares these characteristics.","PeriodicalId":252628,"journal":{"name":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSITSS54238.2021.9682970","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Reinforcement Learning is an area of machine learning concerned with enabling an agent to solve a problem with feedback with the end goal to maximize some form of cumulative long-term reward. In this paper, two different Reinforcement Learning techniques from the value-based technique and policy gradient based method headers are implemented and analyzed. The algorithms chosen under these headers are Deep Q Learning and Policy Gradient respectively. The environment in which the comparison is done is OpenAI Gym’s LunarLander environment. A comparative analysis of the two techniques is then performed in order to understand the differences in a deterministic episodic state space. Both of these algorithms are model free, that is, they can be applied irrespective of the environment and do not need to have any knowledge about the exact details of the environment itself, hence the comparison can be extended to any other environment that shares these characteristics.