基于像素数据的强化学习算法应用研究

Journal of the Korea society of IT services Pub Date : 2016-12-31 DOI:10.9716/KITS.2016.15.4.085

S. Moon, Yongchan Choi

{"title":"基于像素数据的强化学习算法应用研究","authors":"S. Moon, Yongchan Choi","doi":"10.9716/KITS.2016.15.4.085","DOIUrl":null,"url":null,"abstract":"Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌第15卷第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi","PeriodicalId":272384,"journal":{"name":"Journal of the Korea society of IT services","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Study on Application of Reinforcement Learning Algorithm Using Pixel Data\",\"authors\":\"S. Moon, Yongchan Choi\",\"doi\":\"10.9716/KITS.2016.15.4.085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌第15卷第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi\",\"PeriodicalId\":272384,\"journal\":{\"name\":\"Journal of the Korea society of IT services\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Korea society of IT services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.9716/KITS.2016.15.4.085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Korea society of IT services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.9716/KITS.2016.15.4.085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提交:2016年10月17日,1 st修订:10月26日,2016年接受了:2016年的10月28日*본연구는미래창조과학부및정보통신기술진흥센터의특西南성화대학원지원사업의연구결과로수행되었음(과제번호:r0346 - 16 - 1010)。**숭打孔打孔，打孔打孔，打孔打孔，打孔打孔***숭打孔打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔，打孔。在人工智能领域，将相关知识应用于复杂问题的研究正在大量进行，需要将各种学习算法和训练方法应用于人工智能系统。此外，缺乏对决策主体的绩效评价。利用本研究设计的强化学习方法找到最优解的决策代理可以从动态环境中收集观察到的原始像素数据，并根据这些数据自行做出决策。决策代理使用卷积神经网络对其面临的情况进行分类，从环境中观察到的数据在使用前经过预处理。本研究描述了卷积神经网络和决策代理是如何配置的，通过基于值的算法和基于策略的算法(Deep Q-Networks和Policy Gradient)分析了学习性能，阐述了它们的区别，并演示了卷积神经网络在使用像素数据时如何影响整个学习性能。该研究有望为人工智能系统的改进做出贡献，该系统可以有效地利用从原始像素数据中提取的特征找到最优解。关键词:人工智能，强化学习，CNN(卷积神经网络)，DQN(深度q -网络)，PG(策略梯度)Yonglak崔

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Study on Application of Reinforcement Learning Algorithm Using Pixel Data

Submitted:October 17, 2016 1 st Revision:October 26, 2016 Accepted:October 28, 2016 * 본 연구는 미래창조과학부 및 정보통신기술진흥센터의 SW특성화대학원 지원사업의 연구결과로 수행되었음(과제 번호 : R0346-16-1010). ** 숭실대학교 소프트웨어특성화대학원 석사과정, 교신저자 *** 숭실대학교 소프트웨어특성화대학원 교수 Recently, deep learning and machine learning have attracted considerable attention and many supporting frameworks appeared. In artificial intelligence field, a large body of research is underway to apply the relevant knowledge for complex problem-solving, necessitating the application of various learning algorithms and training methods to artificial intelligence systems. In addition, there is a dearth of performance evaluation of decision making agents. The decision making agent that can find optimal solutions by using reinforcement learning methods designed through this research can collect raw pixel data observed from dynamic environments and make decisions by itself based on the data. The decision making agent uses convolutional neural networks to classify situations it confronts, and the data observed from the environment undergoes preprocessing before being used. This research represents how the convolutional neural networks and the decision making agent are configured, analyzes learning performance through a value-based algorithm and a policy-based algorithm : a Deep Q-Networks and a Policy Gradient, sets forth their differences and demonstrates how the convolutional neural networks affect entire learning performance when using pixel data. This research is expected to contribute to the improvement of artificial intelligence systems which can efficiently find optimal solutions by using features extracted from raw pixel data. Keyword:Artificial Intelligence, Reinforcement Learning, CNN(Convolutional Neural Networks), DQN(Deep Q-Networks), PG(Policy Gradient) 韓國IT서비스學會誌第15卷第4號 2016年 12月, pp.85-95 86 Saemaro Moon.Yonglak Choi

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Korea society of IT services

自引率

0.00%

发文量