Modified Q-Learning Algorithm for Mobile Robot Real-Time Path Planning using Reduced States

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Pub Date : 2023-06-02 DOI:10.29207/resti.v7i3.4949

Hidayat, A. Buono, K. Priandana, S. Wahjuni

{"title":"Modified Q-Learning Algorithm for Mobile Robot Real-Time Path Planning using Reduced States","authors":"Hidayat, A. Buono, K. Priandana, S. Wahjuni","doi":"10.29207/resti.v7i3.4949","DOIUrl":null,"url":null,"abstract":"Path planning is an essential algorithm in any autonomous mobile robot, including agricultural robots. One of the reinforcement learning methods that can be used for mobile robot path planning is the Q-Learning algorithm. However, the conventional Q-learning method explores all possible robot states in order to find the most optimum path. Thus, this method requires extensive computational cost especially when there are considerable grids to be computed. This study modified the original Q-Learning algorithm by removing the impassable area, so that these areas are not considered as grids to be computed. This modified Q-Learning method was simulated as path finding algorithm for autonomous mobile robot operated at the Agribusiness and Technology Park (ATP), IPB University. Two simulations were conducted to compare the original Q-Learning method and the modified Q-Learning method. The simulation results showed that the state reductions in the modified Q-Learning method can lower the computation cost to 50.71% from the computation cost of the original Q-Learning method, that is, an average computation time of 25.74s as compared to 50.75s, respectively. Both methods produce similar number of states as the robot’s optimal path, i.e. 56 states, based on the reward obtained by the robot while selecting the path. However, the modified Q-Learning algorithm is capable of finding the path to the destination point with a minimum learning rate parameter value of 0.2 when the discount factor value is 0.9.","PeriodicalId":435683,"journal":{"name":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/resti.v7i3.4949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Path planning is an essential algorithm in any autonomous mobile robot, including agricultural robots. One of the reinforcement learning methods that can be used for mobile robot path planning is the Q-Learning algorithm. However, the conventional Q-learning method explores all possible robot states in order to find the most optimum path. Thus, this method requires extensive computational cost especially when there are considerable grids to be computed. This study modified the original Q-Learning algorithm by removing the impassable area, so that these areas are not considered as grids to be computed. This modified Q-Learning method was simulated as path finding algorithm for autonomous mobile robot operated at the Agribusiness and Technology Park (ATP), IPB University. Two simulations were conducted to compare the original Q-Learning method and the modified Q-Learning method. The simulation results showed that the state reductions in the modified Q-Learning method can lower the computation cost to 50.71% from the computation cost of the original Q-Learning method, that is, an average computation time of 25.74s as compared to 50.75s, respectively. Both methods produce similar number of states as the robot’s optimal path, i.e. 56 states, based on the reward obtained by the robot while selecting the path. However, the modified Q-Learning algorithm is capable of finding the path to the destination point with a minimum learning rate parameter value of 0.2 when the discount factor value is 0.9.

查看原文本刊更多论文

基于状态简化的移动机器人实时路径规划改进q -学习算法

路径规划是包括农业机器人在内的任何自主移动机器人的基本算法。可用于移动机器人路径规划的强化学习方法之一是Q-Learning算法。然而，传统的Q-learning方法探索所有可能的机器人状态，以找到最优路径。因此，这种方法需要大量的计算成本，特别是当有相当多的网格需要计算时。本研究对原来的Q-Learning算法进行了改进，去掉了不可通过的区域，使这些区域不被视为网格进行计算。将改进的Q-Learning方法作为自主移动机器人的寻路算法进行了仿真，该算法在IPB大学农业综合企业与科技园(ATP)运行。对原Q-Learning方法和改进后的Q-Learning方法进行了两次仿真比较。仿真结果表明，改进Q-Learning方法的状态缩减可以将计算成本从原始Q-Learning方法的计算成本降低到50.71%，即平均计算时间分别为25.74s和50.75s。基于机器人在选择路径时获得的奖励，这两种方法产生的状态数与机器人的最优路径相似，即56个状态。而改进后的Q-Learning算法在折现因子值为0.9时，能够以最小学习率参数值为0.2找到到达目的点的路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

自引率

0.00%

发文量