使用深度Q-Networks训练Agent在Unity ML-Agents香蕉环境中导航

ERN: Neural Networks & Related Topics (Topic) Pub Date : 2021-07-07 DOI:10.2139/ssrn.3881878

Oluwaseyi (Tony) Awoga CPA, PRM

{"title":"使用深度Q-Networks训练Agent在Unity ML-Agents香蕉环境中导航","authors":"Oluwaseyi (Tony) Awoga CPA, PRM","doi":"10.2139/ssrn.3881878","DOIUrl":null,"url":null,"abstract":"Deep Q-learning is the combination of the Q-learning process with a function approximation technique such as a neural network. According to (Zai & Brown 2020), the main idea behind Q-learning is the use of an algorithm to predict a state-action pair, and to then compare the results generated from this prediction to the observed accumulated rewards at some later time. The parameters of the algorithms are then updated so that it makes better predictions next time. While this technique has some advantages that make it very useful for solving reinforcement learning problems, it also falls short for solving complex problems with large state-space. In fact, (Google DeepMind 2015) supported the above conclusion in its seminal paper entitled “Human-level control through deep reinforcement learning”. In this paper, (Mnih et al 2015) asserted that “to use reinforcement learning successfully in situations approaching real-world complexity, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experiences to new situations”. To achieve this objective they stated further, “we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks”. While Q-learning as a tool for solving reinforcement learning problems has enjoyed some remarkable successes in the past, it was not until the introduction of DQN that practitioners were able to use it to solve large-scale problems. Prior to that, reinforcement learning was limited to “applications and domains in which useful features could be handcrafted, or to domains with fully observed, low-dimensional state spaces”, (Mnih et al 2015) argued further.","PeriodicalId":114865,"journal":{"name":"ERN: Neural Networks & Related Topics (Topic)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Deep Q-Networks to Train an Agent to Navigate the Unity ML-Agents Banana Environment\",\"authors\":\"Oluwaseyi (Tony) Awoga CPA, PRM\",\"doi\":\"10.2139/ssrn.3881878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Q-learning is the combination of the Q-learning process with a function approximation technique such as a neural network. According to (Zai & Brown 2020), the main idea behind Q-learning is the use of an algorithm to predict a state-action pair, and to then compare the results generated from this prediction to the observed accumulated rewards at some later time. The parameters of the algorithms are then updated so that it makes better predictions next time. While this technique has some advantages that make it very useful for solving reinforcement learning problems, it also falls short for solving complex problems with large state-space. In fact, (Google DeepMind 2015) supported the above conclusion in its seminal paper entitled “Human-level control through deep reinforcement learning”. In this paper, (Mnih et al 2015) asserted that “to use reinforcement learning successfully in situations approaching real-world complexity, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experiences to new situations”. To achieve this objective they stated further, “we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks”. While Q-learning as a tool for solving reinforcement learning problems has enjoyed some remarkable successes in the past, it was not until the introduction of DQN that practitioners were able to use it to solve large-scale problems. Prior to that, reinforcement learning was limited to “applications and domains in which useful features could be handcrafted, or to domains with fully observed, low-dimensional state spaces”, (Mnih et al 2015) argued further.\",\"PeriodicalId\":114865,\"journal\":{\"name\":\"ERN: Neural Networks & Related Topics (Topic)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Neural Networks & Related Topics (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3881878\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Neural Networks & Related Topics (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3881878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

深度q学习是q学习过程与函数逼近技术(如神经网络)的结合。根据(Zai & Brown 2020)的说法，Q-learning背后的主要思想是使用一种算法来预测状态-动作对，然后将该预测产生的结果与随后观察到的累积奖励进行比较。然后更新算法的参数，以便下次做出更好的预测。虽然这种技术有一些优点，使其在解决强化学习问题时非常有用，但它在解决具有大状态空间的复杂问题时也不足。事实上，(b谷歌DeepMind 2015)在其题为“通过深度强化学习进行人类水平的控制”的开创性论文中支持上述结论。在本文中，(Mnih et al . 2015)断言“为了在接近现实世界复杂性的情况下成功地使用强化学习，智能体面临着一项困难的任务:它们必须从高维感官输入中获得环境的有效表示，并使用这些来将过去的经验推广到新的情况”。为了实现这一目标，他们进一步表示，“我们开发了一种新的代理，一种深度q网络(DQN)，它能够将强化学习与一类被称为深度神经网络的人工神经网络结合起来”。虽然Q-learning作为解决强化学习问题的工具在过去取得了一些显著的成功，但直到引入DQN，从业者才能够使用它来解决大规模的问题。在此之前，强化学习仅限于“可以手工制作有用特征的应用程序和领域，或者具有完全观察到的低维状态空间的领域”，(Mnih et al . 2015)进一步论证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Deep Q-Networks to Train an Agent to Navigate the Unity ML-Agents Banana Environment

Deep Q-learning is the combination of the Q-learning process with a function approximation technique such as a neural network. According to (Zai & Brown 2020), the main idea behind Q-learning is the use of an algorithm to predict a state-action pair, and to then compare the results generated from this prediction to the observed accumulated rewards at some later time. The parameters of the algorithms are then updated so that it makes better predictions next time. While this technique has some advantages that make it very useful for solving reinforcement learning problems, it also falls short for solving complex problems with large state-space. In fact, (Google DeepMind 2015) supported the above conclusion in its seminal paper entitled “Human-level control through deep reinforcement learning”. In this paper, (Mnih et al 2015) asserted that “to use reinforcement learning successfully in situations approaching real-world complexity, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experiences to new situations”. To achieve this objective they stated further, “we developed a novel agent, a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks”. While Q-learning as a tool for solving reinforcement learning problems has enjoyed some remarkable successes in the past, it was not until the introduction of DQN that practitioners were able to use it to solve large-scale problems. Prior to that, reinforcement learning was limited to “applications and domains in which useful features could be handcrafted, or to domains with fully observed, low-dimensional state spaces”, (Mnih et al 2015) argued further.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Neural Networks & Related Topics (Topic)

自引率

0.00%

发文量