SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-03-21 DOI:10.1109/LRA.2025.3553785

JunEn Low;Maximilian Adang;Javier Yu;Keiko Nagami;Mac Schwager

{"title":"SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum","authors":"JunEn Low;Maximilian Adang;Javier Yu;Keiko Nagami;Mac Schwager","doi":"10.1109/LRA.2025.3553785","DOIUrl":null,"url":null,"abstract":"We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100 k–300 k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"5122-5129"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937041/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100 k–300 k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field.

查看原文本刊更多论文

真空烹调：在高斯溅射真空中烹饪视觉无人机导航策略

我们提出了一种新的模拟器、训练方法和策略架构，统称为SOUS VIDE，用于端到端视觉无人机导航。我们的训练策略展示了零射击模拟到真实的转换，具有强大的真实世界性能，仅使用机载感知和计算。我们的模拟器称为FiGS，将计算简单的无人机动力学模型与高视觉保真度的高斯飞溅场景重建相结合。FiGS可以快速模拟无人机飞行，产生高达130帧/秒的逼真图像。我们使用FiGS从具有特权状态和动态信息的专家MPC中收集100 k - 300 k图像/状态-动作对，随机化动态参数和空间干扰。然后，我们将这个专家MPC提炼成一个端到端的视觉运动策略，其中包含一个轻量级的神经结构，称为SV-Net。SV-Net将彩色图像、光流和IMU数据流处理成无人机上20hz的低推力和体率命令。至关重要的是，SV-Net包括一个学习模块，用于低级控制，可在运行时适应无人机动力学的变化。在105个硬件实验的运动中，我们展示了SOUS VIDE策略对30%的质量变化，40米/秒的阵风，60%的环境亮度变化，从场景中移动或移除物体以及人们在无人机视野中积极移动的稳健性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.