Dynamic Adjustment of Reward Function for Proximal Policy Optimization with Imitation Learning: Application to Automated Parking Systems

2022 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2022-06-05 DOI:10.1109/iv51971.2022.9827194

Mohamad Albilani, A. Bouzeghoub

{"title":"Dynamic Adjustment of Reward Function for Proximal Policy Optimization with Imitation Learning: Application to Automated Parking Systems","authors":"Mohamad Albilani, A. Bouzeghoub","doi":"10.1109/iv51971.2022.9827194","DOIUrl":null,"url":null,"abstract":"Automated Parking Systems (APS) are responsible for performing a parking maneuver in a secure and time-efficient full autonomy.These systems include mainly three methods; parking spot exploration, path planning, and path tracking. In the literature, there are several path planning and tracking methods where the application of reinforcement learning is widespread. However, performance tuning and ensuring efficiency remains a significant open problem. Moreover, these methods suffer from a non-linearity issue of vehicle dynamics, that causes a deviation from the original route, and do not respect the BS ISO 16787-2017 standard that outlines the minimum requirements needed in APS. To overcome these limitations, our contribution in this paper, named DPPO-IL, is fourfold: (i) A new framework using the Proximal Policy optimization algorithm, allowing agent to explore an empty parking spot, plan then park a car in a random parking spot by avoiding static and dynamic obstacles; (ii) A dynamic adjustment of the reward function using intrinsic reward signals to induce the agent to explore more; (iii) An approach to learn policies from expert demonstrations using imitation learning combined with deep reinforcement learning to speed up the learning phase and reduce the training time; (iv) A task-specific curriculum learning to train the agent in a very complex environment. Experiments show promising results, especially that our approach managed to achieve a 90% success rate where 97% of them were aligned with the parking spot, with an inclination angle greater than ±0.2° and a deviation less than 0.1 meter. These results exceeded the state of the art while respecting the ISO 16787-2017 standard.","PeriodicalId":184622,"journal":{"name":"2022 IEEE Intelligent Vehicles Symposium (IV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iv51971.2022.9827194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Automated Parking Systems (APS) are responsible for performing a parking maneuver in a secure and time-efficient full autonomy.These systems include mainly three methods; parking spot exploration, path planning, and path tracking. In the literature, there are several path planning and tracking methods where the application of reinforcement learning is widespread. However, performance tuning and ensuring efficiency remains a significant open problem. Moreover, these methods suffer from a non-linearity issue of vehicle dynamics, that causes a deviation from the original route, and do not respect the BS ISO 16787-2017 standard that outlines the minimum requirements needed in APS. To overcome these limitations, our contribution in this paper, named DPPO-IL, is fourfold: (i) A new framework using the Proximal Policy optimization algorithm, allowing agent to explore an empty parking spot, plan then park a car in a random parking spot by avoiding static and dynamic obstacles; (ii) A dynamic adjustment of the reward function using intrinsic reward signals to induce the agent to explore more; (iii) An approach to learn policies from expert demonstrations using imitation learning combined with deep reinforcement learning to speed up the learning phase and reduce the training time; (iv) A task-specific curriculum learning to train the agent in a very complex environment. Experiments show promising results, especially that our approach managed to achieve a 90% success rate where 97% of them were aligned with the parking spot, with an inclination angle greater than ±0.2° and a deviation less than 0.1 meter. These results exceeded the state of the art while respecting the ISO 16787-2017 standard.

查看原文本刊更多论文

基于模仿学习的近端策略优化奖励函数动态调整:在自动泊车系统中的应用

自动泊车系统(APS)负责以安全和高效的完全自主方式执行泊车机动。这些系统主要包括三种方法;车位探索，路径规划，路径跟踪。在文献中，有几种路径规划和跟踪方法，其中强化学习的应用非常广泛。然而，性能调优和确保效率仍然是一个悬而未决的重大问题。此外，这些方法受到车辆动力学非线性问题的影响，导致偏离原始路线，并且不符合BS ISO 16787-2017标准，该标准概述了APS所需的最低要求。为了克服这些限制，我们在本文中的贡献，命名为DPPO-IL，有四个方面:(i)使用近端策略优化算法的新框架，允许智能体探索一个空停车位，通过避开静态和动态障碍物，计划然后将汽车停放在随机停车位;(ii)利用内在奖励信号对奖励函数进行动态调整，诱导agent进行更多的探索;(iii)利用模仿学习结合深度强化学习从专家演示中学习政策的方法，以加快学习阶段并缩短训练时间;(iv)在非常复杂的环境中学习训练代理的特定任务课程。实验结果令人满意，特别是我们的方法达到了90%的成功率，其中97%的泊位与泊位对齐，倾角大于±0.2°，偏差小于0.1米。这些结果在遵守ISO 16787-2017标准的同时超越了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量