Conference on Robot Learning最新文献_第2页

Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving 基于粒子的自动驾驶状态空间模型学习分数估计

Conference on Robot Learning Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.06968

Angad Singh, Omar Makhlouf, Maximilian Igl, J. Messias, A. Doucet, Shimon Whiteson

{"title":"Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving","authors":"Angad Singh, Omar Makhlouf, Maximilian Igl, J. Messias, A. Doucet, Shimon Whiteson","doi":"10.48550/arXiv.2212.06968","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06968","url":null,"abstract":"Multi-object state estimation is a fundamental problem for robotic applications where a robot must interact with other moving objects. Typically, other objects' relevant state features are not directly observable, and must instead be inferred from observations. Particle filtering can perform such inference given approximate transition and observation models. However, these models are often unknown a priori, yielding a difficult parameter estimation problem since observations jointly carry transition and observation noise. In this work, we consider learning maximum-likelihood parameters using particle methods. Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates. By contrast, we exploit Fisher's identity to obtain a particle-based approximation of the score function (the gradient of the log likelihood) that yields a low variance estimate while only requiring stepwise differentiation through the transition and observation models. We apply our method to real data collected from autonomous vehicles (AVs) and show that it learns better models than existing techniques and is more stable in training, yielding an effective smoother for tracking the trajectories of vehicles around an AV.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128843803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Domain Transfer via Semantic Skill Imitation 语义技能模仿的跨领域迁移

Conference on Robot Learning Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.07407

Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai

引用次数: 3

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles DiffStack:自动驾驶汽车的可微模块化控制堆栈

Conference on Robot Learning Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06437

Peter Karkus, B. Ivanovic, Shie Mannor, M. Pavone

{"title":"DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles","authors":"Peter Karkus, B. Ivanovic, Shie Mannor, M. Pavone","doi":"10.48550/arXiv.2212.06437","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06437","url":null,"abstract":"Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128576147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare MegaPose:通过渲染和比较的新对象的6D姿态估计

Conference on Robot Learning Pub Date : 2022-12-13 DOI: 10.48550/arXiv.2212.06870

Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic

{"title":"MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare","authors":"Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic","doi":"10.48550/arXiv.2212.06870","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06870","url":null,"abstract":"We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes ROAD:学习一个隐式递归八叉树自动解码器来有效地编码3D形状

Conference on Robot Learning Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06193

Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon

{"title":"ROAD: Learning an Implicit Recursive Octree Auto-Decoder to Efficiently Encode 3D Shapes","authors":"Sergey Zakharov, Rares Ambrus, Katherine Liu, Adrien Gaidon","doi":"10.48550/arXiv.2212.06193","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06193","url":null,"abstract":"Compact and accurate representations of 3D shapes are central to many perception and robotics tasks. State-of-the-art learning-based methods can reconstruct single objects but scale poorly to large datasets. We present a novel recursive implicit representation to efficiently and accurately encode large datasets of complex 3D shapes by recursively traversing an implicit octree in latent space. Our implicit Recursive Octree Auto-Decoder (ROAD) learns a hierarchically structured latent space enabling state-of-the-art reconstruction results at a compression ratio above 99%. We also propose an efficient curriculum learning scheme that naturally exploits the coarse-to-fine properties of the underlying octree spatial representation. We explore the scaling law relating latent space dimension, dataset size, and reconstruction accuracy, showing that increasing the latent space dimension is enough to scale to large shape datasets. Finally, we show that our learned latent space encodes a coarse-to-fine hierarchical structure yielding reusable latents across different levels of details, and we provide qualitative evidence of generalization to novel shapes outside the training set.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132569320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

MIRA: Mental Imagery for Robotic Affordances MIRA:机器人能力的心理意象

Conference on Robot Learning Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06088

Yilun Du

引用次数: 14

Where To Start? Transferring Simple Skills to Complex Environments 从哪里开始?将简单技能转移到复杂环境中

Conference on Robot Learning Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06111

Vitalis Vosylius, Edward Johns

引用次数: 6

Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes 杂乱场景中尺度平衡六自由度抓握检测方法研究

Conference on Robot Learning Pub Date : 2022-12-10 DOI: 10.48550/arXiv.2212.05275

Haoxiang Ma, Di Huang

{"title":"Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes","authors":"Haoxiang Ma, Di Huang","doi":"10.48550/arXiv.2212.05275","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05275","url":null,"abstract":"In this paper, we focus on the problem of feature learning in the presence of scale imbalance for 6-DoF grasp detection and propose a novel approach to especially address the difficulty in dealing with small-scale samples. A Multi-scale Cylinder Grouping (MsCG) module is presented to enhance local geometry representation by combining multi-scale cylinder features and global context. Moreover, a Scale Balanced Learning (SBL) loss and an Object Balanced Sampling (OBS) strategy are designed, where SBL enlarges the gradients of the samples whose scales are in low frequency by apriori weights while OBS captures more points on small-scale objects with the help of an auxiliary segmentation network. They alleviate the influence of the uneven distribution of grasp scales in training and inference respectively. In addition, Noisy-clean Mix (NcM) data augmentation is introduced to facilitate training, aiming to bridge the domain gap between synthetic and raw scenes in an efficient way by generating more data which mix them into single ones at instance-level. Extensive experiments are conducted on the GraspNet-1Billion benchmark and competitive results are reached with significant gains on small-scale cases. Besides, the performance of real-world grasping highlights its generalization ability. Our code is available at https://github.com/mahaoxiang822/Scale-Balanced-Grasp.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Visuotactile Affordances for Cloth Manipulation with Local Control 局部控制布料操作的视觉可视性

Conference on Robot Learning Pub Date : 2022-12-09 DOI: 10.48550/arXiv.2212.05108

N. Sunil, Shaoxiong Wang, Y. She, E. Adelson, Alberto Rodriguez

引用次数: 10

VideoDex: Learning Dexterity from Internet Videos VideoDex:从网络视频中学习灵活性

Conference on Robot Learning Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04498

Kenneth Shaw, Shikhar Bahl, Deepak Pathak

引用次数: 18