SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

IF 4.6 2区 计算机科学 Q2 ROBOTICS
JunEn Low;Maximilian Adang;Javier Yu;Keiko Nagami;Mac Schwager
{"title":"SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum","authors":"JunEn Low;Maximilian Adang;Javier Yu;Keiko Nagami;Mac Schwager","doi":"10.1109/LRA.2025.3553785","DOIUrl":null,"url":null,"abstract":"We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100 k–300 k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"5122-5129"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937041/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100 k–300 k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field.
真空烹调:在高斯溅射真空中烹饪视觉无人机导航策略
我们提出了一种新的模拟器、训练方法和策略架构,统称为SOUS VIDE,用于端到端视觉无人机导航。我们的训练策略展示了零射击模拟到真实的转换,具有强大的真实世界性能,仅使用机载感知和计算。我们的模拟器称为FiGS,将计算简单的无人机动力学模型与高视觉保真度的高斯飞溅场景重建相结合。FiGS可以快速模拟无人机飞行,产生高达130帧/秒的逼真图像。我们使用FiGS从具有特权状态和动态信息的专家MPC中收集100 k - 300 k图像/状态-动作对,随机化动态参数和空间干扰。然后,我们将这个专家MPC提炼成一个端到端的视觉运动策略,其中包含一个轻量级的神经结构,称为SV-Net。SV-Net将彩色图像、光流和IMU数据流处理成无人机上20hz的低推力和体率命令。至关重要的是,SV-Net包括一个学习模块,用于低级控制,可在运行时适应无人机动力学的变化。在105个硬件实验的运动中,我们展示了SOUS VIDE策略对30%的质量变化,40米/秒的阵风,60%的环境亮度变化,从场景中移动或移除物体以及人们在无人机视野中积极移动的稳健性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信