{"title":"基于像素数据的强化学习算法应用研究","authors":"S. Moon, Yongchan Choi","doi":"10.9716/KITS.2016.15.4.085","DOIUrl":null,"url":null,"abstract":"Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌 第15卷 第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi","PeriodicalId":272384,"journal":{"name":"Journal of the Korea society of IT services","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Study on Application of Reinforcement Learning Algorithm Using Pixel Data\",\"authors\":\"S. Moon, Yongchan Choi\",\"doi\":\"10.9716/KITS.2016.15.4.085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌 第15卷 第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi\",\"PeriodicalId\":272384,\"journal\":{\"name\":\"Journal of the Korea society of IT services\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Korea society of IT services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9716/KITS.2016.15.4.085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Korea society of IT services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9716/KITS.2016.15.4.085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Study on Application of Reinforcement Learning Algorithm Using Pixel Data
Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌 第15卷 第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi