Autonomous Navigation via a Deep Q Network with One-Hot Image Encoding

2019 IEEE International Symposium on Measurement and Control in Robotics (ISMCR) Pub Date : 2019-09-01 DOI:10.1109/ISMCR47492.2019.8955697

Will Anderson, Kevin Carey, Eric M. Sturzinger, Christopher J. Lowrance

{"title":"Autonomous Navigation via a Deep Q Network with One-Hot Image Encoding","authors":"Will Anderson, Kevin Carey, Eric M. Sturzinger, Christopher J. Lowrance","doi":"10.1109/ISMCR47492.2019.8955697","DOIUrl":null,"url":null,"abstract":"Common autonomous driving techniques employ various combinations of convolutional and deep neural networks to safely and efficiently navigate unique road and traffic conditions. This paper investigates the feasibility of employing a reinforcement learning (RL) model for autonomous navigation using a low dimensional input. While many navigation applications generate each individual state as a function of a frame's raw pixel information, we use a deep Q network (DQN) with reduced input dimensionality to train a mobile robot to continuously remain within a lane around an elliptical track. We accomplish this by using a one-hot encoding scheme that assigns a binary variable to each element in a square array. This value is a function of whether the input frame detects the presence of a lane boundary. Our ultimate goal was to determine the minimum number of training samples required to consistently train the robot to complete one cycle around the track, from multiple starting positions and directions, without crossing a lane boundary. We found that by intelligently balancing exploration and exploitation of its environment, as well as the rewards for staying in the lane, the robot was able to achieve its goal with a small number of samples.","PeriodicalId":423631,"journal":{"name":"2019 IEEE International Symposium on Measurement and Control in Robotics (ISMCR)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Measurement and Control in Robotics (ISMCR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMCR47492.2019.8955697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Common autonomous driving techniques employ various combinations of convolutional and deep neural networks to safely and efficiently navigate unique road and traffic conditions. This paper investigates the feasibility of employing a reinforcement learning (RL) model for autonomous navigation using a low dimensional input. While many navigation applications generate each individual state as a function of a frame's raw pixel information, we use a deep Q network (DQN) with reduced input dimensionality to train a mobile robot to continuously remain within a lane around an elliptical track. We accomplish this by using a one-hot encoding scheme that assigns a binary variable to each element in a square array. This value is a function of whether the input frame detects the presence of a lane boundary. Our ultimate goal was to determine the minimum number of training samples required to consistently train the robot to complete one cycle around the track, from multiple starting positions and directions, without crossing a lane boundary. We found that by intelligently balancing exploration and exploitation of its environment, as well as the rewards for staying in the lane, the robot was able to achieve its goal with a small number of samples.

查看原文本刊更多论文

基于单热图像编码的深度Q网络自主导航

常见的自动驾驶技术采用卷积和深度神经网络的各种组合，以安全有效地导航独特的道路和交通条件。本文研究了采用低维输入的强化学习(RL)模型进行自主导航的可行性。虽然许多导航应用程序将每个单独的状态作为帧的原始像素信息的函数，但我们使用降低输入维数的深度Q网络(DQN)来训练移动机器人连续保持在椭圆轨道周围的车道内。我们通过使用一个单热编码方案来实现这一点，该方案为正方形数组中的每个元素分配一个二进制变量。该值是输入帧是否检测到车道边界存在的函数。我们的最终目标是确定所需的最小训练样本数，以始终如一地训练机器人从多个起始位置和方向绕轨道完成一个周期，而不越过车道边界。我们发现，通过智能地平衡对环境的探索和利用，以及对留在车道上的奖励，机器人能够用少量样本实现目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on Measurement and Control in Robotics (ISMCR)

自引率

0.00%

发文量