Qianyi Zhang, Dingye Yang, Lei Zhou, Zhengxi Hu, Jingtai Liu
{"title":"Trajectory Optimization on Safety, Length and Smoothness in Complex Environments with A Locally Trained and Globally Working Agent","authors":"Qianyi Zhang, Dingye Yang, Lei Zhou, Zhengxi Hu, Jingtai Liu","doi":"10.1109/RCAR54675.2022.9872237","DOIUrl":null,"url":null,"abstract":"Focused on the balance among safety, length, and smoothness, this paper proposes a novel model to train an agent with deep reinforcement learning to optimize trajectory in complex environments. Inspired by the human habit that first finds the shortest trajectory and then slightly optimizes safety and smoothness, State is initialized as a radical trajectory combined with local obstacle distribution. Action adjusts dangerous waypoints jointly. Reward penalizes length increase based on local smoothness change. Episode is early terminated to divide the whole problem into smaller ones, while reward assembles them back with a large amount of training data. This allows the agent to be trained locally and work globally to accelerate convergence. Performances in various scenarios demonstrate our method’s ability to balance safety, length, and smoothness. With the Markov property of the problem and our newly discovered mathematical property of B-spline, it adjusts waypoints under sub-grid map and can be generalized stably in various maps with dense obstacles.","PeriodicalId":304963,"journal":{"name":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR54675.2022.9872237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Focused on the balance among safety, length, and smoothness, this paper proposes a novel model to train an agent with deep reinforcement learning to optimize trajectory in complex environments. Inspired by the human habit that first finds the shortest trajectory and then slightly optimizes safety and smoothness, State is initialized as a radical trajectory combined with local obstacle distribution. Action adjusts dangerous waypoints jointly. Reward penalizes length increase based on local smoothness change. Episode is early terminated to divide the whole problem into smaller ones, while reward assembles them back with a large amount of training data. This allows the agent to be trained locally and work globally to accelerate convergence. Performances in various scenarios demonstrate our method’s ability to balance safety, length, and smoothness. With the Markov property of the problem and our newly discovered mathematical property of B-spline, it adjusts waypoints under sub-grid map and can be generalized stably in various maps with dense obstacles.