Agile Flights Through a Moving Narrow Gap for Quadrotors Using Adaptive Curriculum Learning

IF 14 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-04-19 DOI:10.1109/TIV.2024.3391384

Mengyun Wang;Shengde Jia;Yifeng Niu;Yunzhuo Liu;Chao Yan;Chang Wang

{"title":"Agile Flights Through a Moving Narrow Gap for Quadrotors Using Adaptive Curriculum Learning","authors":"Mengyun Wang;Shengde Jia;Yifeng Niu;Yunzhuo Liu;Chao Yan;Chang Wang","doi":"10.1109/TIV.2024.3391384","DOIUrl":null,"url":null,"abstract":"Fast and agile flying through a gap is challenging for a quadrotor if the gap is narrow, tilted, and moving. Due to the strict time-variant position and attitude constraints, collision-free traversal trajectories under the under-actuated quadrotor dynamics are sparse and difficult to solve. To achieve this challenging task in the real world, we propose a Gap-Traversing Adaptive Curriculum Learning (GTACL) approach, which consists of adaptive curriculum reinforcement learning (ACRL) and online thrust updating (OTU). First, ACRL is introduced to improve sample efficiency, and the policy training is accelerated by designing a curriculum adapted to the agent's capability. Second, OTU is proposed to map the acceleration commands to low-level throttle signals by estimating the thrust model during flight, which reduces the intermediate control variables and helps sim2real transfer. We use the prioritized experience replay mechanism that considers both policy update contribution and data acquisition time to adapt to the changing tasks. GTACL is trained entirely in simulation and can be transferred to other quadrotors with different dynamics. Furthermore, we achieve zero-shot transfer to the real-world quadrotor without fine-tuning. The average success rates of 98% and 87.8% in simulation and real-world experiments for different task conditions demonstrate the robustness of the proposed approach. Comparative results with traditional and related learning-based approaches show the advantages of GTACL in terms of learning efficiency, control performance, and generalization.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"9 11","pages":"6936-6949"},"PeriodicalIF":14.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10505852/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Fast and agile flying through a gap is challenging for a quadrotor if the gap is narrow, tilted, and moving. Due to the strict time-variant position and attitude constraints, collision-free traversal trajectories under the under-actuated quadrotor dynamics are sparse and difficult to solve. To achieve this challenging task in the real world, we propose a Gap-Traversing Adaptive Curriculum Learning (GTACL) approach, which consists of adaptive curriculum reinforcement learning (ACRL) and online thrust updating (OTU). First, ACRL is introduced to improve sample efficiency, and the policy training is accelerated by designing a curriculum adapted to the agent's capability. Second, OTU is proposed to map the acceleration commands to low-level throttle signals by estimating the thrust model during flight, which reduces the intermediate control variables and helps sim2real transfer. We use the prioritized experience replay mechanism that considers both policy update contribution and data acquisition time to adapt to the changing tasks. GTACL is trained entirely in simulation and can be transferred to other quadrotors with different dynamics. Furthermore, we achieve zero-shot transfer to the real-world quadrotor without fine-tuning. The average success rates of 98% and 87.8% in simulation and real-world experiments for different task conditions demonstrate the robustness of the proposed approach. Comparative results with traditional and related learning-based approaches show the advantages of GTACL in terms of learning efficiency, control performance, and generalization.

查看原文本刊更多论文

灵活飞行通过移动窄间隙的四旋翼机使用自适应课程学习

快速和灵活的飞行通过一个差距是具有挑战性的四旋翼，如果差距是狭窄的，倾斜的，并移动。由于严格的时变位置和姿态约束，欠驱动四旋翼动力学下的无碰撞穿越轨迹稀疏且难以求解。为了在现实世界中实现这一具有挑战性的任务，我们提出了一种跨越间隙的自适应课程学习（GTACL）方法，该方法由自适应课程强化学习（ACRL）和在线推力更新（OTU）组成。首先，引入ACRL来提高样本效率，并通过设计适应智能体能力的课程来加速策略训练。其次，通过估算飞行过程中的推力模型，提出将加速度指令映射到低阶油门信号的OTU方法，减少了中间控制变量，有利于模拟真实传输。我们使用考虑策略更新贡献和数据采集时间的优先体验重放机制来适应不断变化的任务。GTACL完全在模拟中训练，可以转移到其他具有不同动力学的四旋翼机。此外，我们实现零射击转移到现实世界的四旋翼没有微调。在不同任务条件下的仿真和实际实验中，平均成功率分别为98%和87.8%，证明了该方法的鲁棒性。与传统和相关的基于学习的方法的比较结果表明，GTACL在学习效率、控制性能和泛化方面具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.