Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

Conference on Robot Learning Pub Date : 2022-10-25 DOI:10.48550/arXiv.2210.14721

John So, Amber Xie, Sunggoo Jung, J. Edlund, Rohan Thakker, Ali-akbar Agha-mohammadi, P. Abbeel, Stephen James

{"title":"Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data","authors":"John So, Amber Xie, Sunggoo Jung, J. Edlund, Rohan Thakker, Ali-akbar Agha-mohammadi, P. Abbeel, Stephen James","doi":"10.48550/arXiv.2210.14721","DOIUrl":null,"url":null,"abstract":"Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.14721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research.

查看原文本刊更多论文

通过模拟到真实:端到端无真实数据的越野自动驾驶

自动驾驶是复杂的，需要复杂的3D场景理解、定位、绘图和控制。我们没有明确地建模和融合这些组件，而是考虑通过强化学习(RL)的端到端方法。然而，在现实世界中收集探索驾驶数据是不切实际和危险的。虽然模拟训练和部署视觉模拟到真实技术在机器人操作中很有效，但部署超出受控工作空间的视点仍然是一个挑战。在本文中，我们通过介绍Sim2Seg来解决这一挑战，Sim2Seg是一种重新想象的RCAN，它跨越了越野自动驾驶的视觉现实差距，而不使用任何真实世界的数据。这是通过学习将随机模拟图像转换为模拟分割和深度图来完成的，随后使真实世界的图像也能够被翻译。这允许我们在模拟中训练端到端强化学习策略，并直接部署在现实世界中。我们的方法可以在1个GPU上进行48小时的训练，其性能可以与几个月来花费数千个工程小时构建的经典感知和控制堆栈一样好。我们希望这项工作能够激励未来端到端的自动驾驶研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量