Conference on Robot Learning最新文献_第10页

Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation 基于视觉的非均匀BEV表示学习与极坐标栅格化和表面估计

Conference on Robot Learning Pub Date : 2022-07-05 DOI: 10.48550/arXiv.2207.01878

Zhi Liu, Shaoyu Chen, Xiaojie Guo, Xinggang Wang, Tianheng Cheng, Hong Zhu, Qian Zhang, Wenyu Liu, Yi Zhang

引用次数: 12

USHER: Unbiased Sampling for Hindsight Experience Replay 无偏采样的后见之明经验回放

Conference on Robot Learning Pub Date : 2022-07-03 DOI: 10.48550/arXiv.2207.01115

Liam Schramm, Yunfu Deng, Edgar Granados, Abdeslam Boularias

引用次数: 1

Discriminator-Guided Model-Based Offline Imitation Learning 基于鉴别器引导模型的离线模仿学习

Conference on Robot Learning Pub Date : 2022-07-01 DOI: 10.48550/arXiv.2207.00244

Wenjia Zhang, Haoran Xu, Haoyi Niu, Peng Cheng, Ming Li, Heming Zhang, Guyue Zhou, Xianyuan Zhan

{"title":"Discriminator-Guided Model-Based Offline Imitation Learning","authors":"Wenjia Zhang, Haoran Xu, Haoyi Niu, Peng Cheng, Ming Li, Heming Zhang, Guyue Zhou, Xianyuan Zhan","doi":"10.48550/arXiv.2207.00244","DOIUrl":"https://doi.org/10.48550/arXiv.2207.00244","url":null,"abstract":"Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning Diverse and Physically Feasible Dexterous Grasps with Generative Model and Bilevel Optimization 用生成模型和双层优化学习多种物理可行的灵巧掌握

Conference on Robot Learning Pub Date : 2022-07-01 DOI: 10.48550/arXiv.2207.00195

A. Wu, Michelle Guo, Karen Liu

引用次数: 10

Watch and Match: Supercharging Imitation with Regularized Optimal Transport 观察与匹配:正则化最优运输的增压模仿

Conference on Robot Learning Pub Date : 2022-06-30 DOI: 10.48550/arXiv.2206.15469

Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

{"title":"Watch and Match: Supercharging Imitation with Regularized Optimal Transport","authors":"Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto","doi":"10.48550/arXiv.2206.15469","DOIUrl":"https://doi.org/10.48550/arXiv.2206.15469","url":null,"abstract":"Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"605 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116382638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision Fleet- dagger:具有可扩展人类监督的交互式机器人舰队学习

Conference on Robot Learning Pub Date : 2022-06-29 DOI: 10.48550/arXiv.2206.14349

Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, K. Dharmarajan, Brijen Thananjeyan, P. Abbeel, Ken Goldberg

{"title":"Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision","authors":"Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, K. Dharmarajan, Brijen Thananjeyan, P. Abbeel, Ken Goldberg","doi":"10.48550/arXiv.2206.14349","DOIUrl":"https://doi.org/10.48550/arXiv.2206.14349","url":null,"abstract":"Commercial and industrial deployments of robot fleets at Amazon, Nimble, Plus One, Waymo, and Zoox query remote human teleoperators when robots are at risk or unable to make task progress. With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time. A central question is how to effectively allocate limited human attention. Prior work addresses this in the single-robot, single-human setting; we formalize the Interactive Fleet Learning (IFL) setting, in which multiple robots interactively query and learn from multiple human supervisors. We propose Return on Human Effort (ROHE) as a new metric and Fleet-DAgger, a family of IFL algorithms. We present an open-source IFL benchmark suite of GPU-accelerated Isaac Gym environments for standardized evaluation and development of IFL algorithms. We compare a novel Fleet-DAgger algorithm to 4 baselines with 100 robots in simulation. We also perform a physical block-pushing experiment with 4 ABB YuMi robot arms and 2 remote humans. Experiments suggest that the allocation of humans to robots significantly affects the performance of the fleet, and that the novel Fleet-DAgger algorithm can achieve up to 8.8x higher ROHE than baselines. See https://tinyurl.com/fleet-dagger for supplemental material.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116153148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Masked World Models for Visual Control 用于视觉控制的蒙面世界模型

Conference on Robot Learning Pub Date : 2022-06-28 DOI: 10.48550/arXiv.2206.14244

Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, P. Abbeel

{"title":"Masked World Models for Visual Control","authors":"Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, P. Abbeel","doi":"10.48550/arXiv.2206.14244","DOIUrl":"https://doi.org/10.48550/arXiv.2206.14244","url":null,"abstract":"Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. Specifically, we train an autoencoder with convolutional layers and vision transformers (ViT) to reconstruct pixels given masked convolutional features, and learn a latent dynamics model that operates on the representations from the autoencoder. Moreover, to encode task-relevant information, we introduce an auxiliary reward prediction objective for the autoencoder. We continually update both autoencoder and dynamics model using online samples collected from environment interaction. We demonstrate that our decoupling approach achieves state-of-the-art performance on a variety of visual robotic tasks from Meta-world and RLBench, e.g., we achieve 81.7% success rate on 50 visual robotic manipulation tasks from Meta-world, while the baseline achieves 67.9%. Code is available on the project website: https://sites.google.com/view/mwm-rl.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"85 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131958440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Rethinking Optimization with Differentiable Simulation from a Global Perspective 全局视角下可微模拟优化的再思考

Conference on Robot Learning Pub Date : 2022-06-28 DOI: 10.48550/arXiv.2207.00167

Rika Antonova, Jingyun Yang, Krishna Murthy Jatavallabhula, J. Bohg

{"title":"Rethinking Optimization with Differentiable Simulation from a Global Perspective","authors":"Rika Antonova, Jingyun Yang, Krishna Murthy Jatavallabhula, J. Bohg","doi":"10.48550/arXiv.2207.00167","DOIUrl":"https://doi.org/10.48550/arXiv.2207.00167","url":null,"abstract":"Differentiable simulation is a promising toolkit for fast gradient-based policy optimization and system identification. However, existing approaches to differentiable simulation have largely tackled scenarios where obtaining smooth gradients has been relatively easy, such as systems with mostly smooth dynamics. In this work, we study the challenges that differentiable simulation presents when it is not feasible to expect that a single descent reaches a global optimum, which is often a problem in contact-rich scenarios. We analyze the optimization landscapes of diverse scenarios that contain both rigid bodies and deformable objects. In dynamic environments with highly deformable objects and fluids, differentiable simulators produce rugged landscapes with nonetheless useful gradients in some parts of the space. We propose a method that combines Bayesian optimization with semi-local 'leaps' to obtain a global search method that can use gradients effectively, while also maintaining robust performance in regions with noisy gradients. We show that our approach outperforms several gradient-based and gradient-free baselines on an extensive set of experiments in simulation, and also validate the method using experiments with a real robot and deformables. Videos and supplementary materials are available at https://tinyurl.com/globdiff","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

DayDreamer: World Models for Physical Robot Learning 白日梦者:物理机器人学习的世界模型

Conference on Robot Learning Pub Date : 2022-06-28 DOI: 10.48550/arXiv.2206.14176

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, P. Abbeel

{"title":"DayDreamer: World Models for Physical Robot Learning","authors":"Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, P. Abbeel","doi":"10.48550/arXiv.2206.14176","DOIUrl":"https://doi.org/10.48550/arXiv.2206.14176","url":null,"abstract":"To solve tasks in complex environments, robots need to learn from experience. Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn, limiting its deployment in the physical world. As a consequence, many advances in robot learning rely on simulators. On the other hand, learning inside of simulators fails to capture the complexity of the real world, is prone to simulator inaccuracies, and the resulting behaviors do not adapt to changes in the world. The Dreamer algorithm has recently shown great promise for learning from small amounts of interaction by planning within a learned world model, outperforming pure reinforcement learning in video games. Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment. However, it is unknown whether Dreamer can facilitate faster learning on physical robots. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators. Dreamer trains a quadruped robot to roll off its back, stand up, and walk from scratch and without resets in only 1 hour. We then push the robot and find that Dreamer adapts within 10 minutes to withstand perturbations or quickly roll over and stand back up. On two different robotic arms, Dreamer learns to pick and place multiple objects directly from camera images and sparse rewards, approaching human performance. On a wheeled robot, Dreamer learns to navigate to a goal position purely from camera images, automatically resolving ambiguity about the robot orientation. Using the same hyperparameters across all experiments, we find that Dreamer is capable of online learning in the real world, establishing a strong baseline. We release our infrastructure for future applications of world models to robot learning.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"6 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 105

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation LaRa:多相机鸟瞰语义分割的潜势和光线

Conference on Robot Learning Pub Date : 2022-06-27 DOI: 10.48550/arXiv.2206.13294

Florent Bartoccioni, 'Eloi Zablocki, Andrei Bursuc, Patrick P'erez, M. Cord, Alahari Karteek

{"title":"LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation","authors":"Florent Bartoccioni, 'Eloi Zablocki, Andrei Bursuc, Patrick P'erez, M. Cord, Alahari Karteek","doi":"10.48550/arXiv.2206.13294","DOIUrl":"https://doi.org/10.48550/arXiv.2206.13294","url":null,"abstract":"Recent works in autonomous driving have widely adopted the bird's-eye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-projection from monocular depth estimation) or expensive direct dense mapping between image pixels and pixels in BEV (e.g., with MLP or attention). In this work, we present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras. Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations. These latent representations, after being processed by a series of self-attention blocks, are then reprojected with a second cross-attention in the BEV space. We demonstrate that our model outperforms the best previous works using transformers on nuScenes. The code and trained models are available at https://github.com/valeoai/LaRa","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128479094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12