{"title":"Data augmented Approach to Optimizing Asynchronous Actor-Critic Methods","authors":"S. N., Pradyumna Rahul K, Vaishnavi Sinha","doi":"10.1109/icdcece53908.2022.9792764","DOIUrl":null,"url":null,"abstract":"Learning from visual observations of an environment is a core and fundamental problem in Reinforcement Learning (RL). Although there have been several advances in the algorithms, especially with the involvement of convolutional neural networks, they are primarily lacking in two aspects: (i) learning efficiency based on observations and (ii) learning generalization. Data augmentation has been shown to be a suitable strategy for enhancing the accuracy of classifier in Deep Learning solutions. With these in mind, this paper describes an implementation of Asynchronous Advantage Actor Critic (A3C) that integrates an optimized approach to observation augmentation policy on each learning batch. This approach is known as Data Augmented Reinforcement Learning (DARL). The proposed approach uses data augmentation to create environment variations to improve the learning policy of A3C with a key idea of data variety and demonstrates a significant improvement over the base implementation, with up to 70% increase in the rewards on several OpenAI Atari benchmarks.","PeriodicalId":417643,"journal":{"name":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icdcece53908.2022.9792764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Learning from visual observations of an environment is a core and fundamental problem in Reinforcement Learning (RL). Although there have been several advances in the algorithms, especially with the involvement of convolutional neural networks, they are primarily lacking in two aspects: (i) learning efficiency based on observations and (ii) learning generalization. Data augmentation has been shown to be a suitable strategy for enhancing the accuracy of classifier in Deep Learning solutions. With these in mind, this paper describes an implementation of Asynchronous Advantage Actor Critic (A3C) that integrates an optimized approach to observation augmentation policy on each learning batch. This approach is known as Data Augmented Reinforcement Learning (DARL). The proposed approach uses data augmentation to create environment variations to improve the learning policy of A3C with a key idea of data variety and demonstrates a significant improvement over the base implementation, with up to 70% increase in the rewards on several OpenAI Atari benchmarks.