Conference on Robot Learning最新文献_第4页

Last-Mile Embodied Visual Navigation 最后一英里具体化视觉导航

Conference on Robot Learning Pub Date : 2022-11-21 DOI: 10.48550/arXiv.2211.11746

Justin Wasserman, Karmesh Yadav, Girish V. Chowdhary, Abhi Gupta, Unnat Jain

{"title":"Last-Mile Embodied Visual Navigation","authors":"Justin Wasserman, Karmesh Yadav, Girish V. Chowdhary, Abhi Gupta, Unnat Jain","doi":"10.48550/arXiv.2211.11746","DOIUrl":"https://doi.org/10.48550/arXiv.2211.11746","url":null,"abstract":"Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative phases. Assigned with an image of the goal, an embodied agent must explore to discover the goal, i.e., search efficiently using learned priors. Once the goal is discovered, the agent must accurately calibrate the last-mile of navigation to the goal. As with any robust system, switches between exploratory goal discovery and exploitative last-mile navigation enable better recovery from errors. Following these intuitive guide rails, we propose SLING to improve the performance of existing image-goal navigation systems. Entirely complementing prior methods, we focus on last-mile navigation and leverage the underlying geometric structure of the problem with neural descriptors. With simple but effective switches, we can easily connect SLING with heuristic, reinforcement learning, and neural modular policies. On a standardized image-goal navigation benchmark (Hahn et al. 2021), we improve performance across policies, scenes, and episode complexity, raising the state-of-the-art from 45% to 55% success rate. Beyond photorealistic simulation, we conduct real-robot experiments in three physical scenes and find these improvements to transfer well to real environments.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Deep Projective Rotation Estimation through Relative Supervision 基于相对监督的深度投影旋转估计

Conference on Robot Learning Pub Date : 2022-11-21 DOI: 10.48550/arXiv.2211.11182

Brian Okorn, Chuer Pan, M. Hebert, David Held

{"title":"Deep Projective Rotation Estimation through Relative Supervision","authors":"Brian Okorn, Chuer Pan, M. Hebert, David Held","doi":"10.48550/arXiv.2211.11182","DOIUrl":"https://doi.org/10.48550/arXiv.2211.11182","url":null,"abstract":"Orientation estimation is the core to a variety of vision and robotics tasks such as camera and object pose estimation. Deep learning has offered a way to develop image-based orientation estimators; however, such estimators often require training on a large labeled dataset, which can be time-intensive to collect. In this work, we explore whether self-supervised learning from unlabeled data can be used to alleviate this issue. Specifically, we assume access to estimates of the relative orientation between neighboring poses, such that can be obtained via a local alignment method. While self-supervised learning has been used successfully for translational object keypoints, in this work, we show that naively applying relative supervision to the rotational group $SO(3)$ will often fail to converge due to the non-convexity of the rotational space. To tackle this challenge, we propose a new algorithm for self-supervised orientation estimation which utilizes Modified Rodrigues Parameters to stereographically project the closed manifold of $SO(3)$ to the open manifold of $mathbb{R}^{3}$, allowing the optimization to be done in an open Euclidean space. We empirically validate the benefits of the proposed algorithm for rotational averaging problem in two settings: (1) direct optimization on rotation parameters, and (2) optimization of parameters of a convolutional neural network that predicts object orientations from images. In both settings, we demonstrate that our proposed algorithm is able to converge to a consistent relative orientation frame much faster than algorithms that purely operate in the $SO(3)$ space. Additional information can be found at https://sites.google.com/view/deep-projective-rotation/home .","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127297678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Safe Control Under Input Limits with Neural Control Barrier Functions 输入限制下的神经控制障碍函数安全控制

Conference on Robot Learning Pub Date : 2022-11-20 DOI: 10.48550/arXiv.2211.11056

Simin Liu, Changliu Liu, J. Dolan

引用次数: 11

DexPoint: Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation DexPoint:模拟到真实灵巧操作的可推广点云强化学习

Conference on Robot Learning Pub Date : 2022-11-17 DOI: 10.48550/arXiv.2211.09423

Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Hao Su, Xiaolong Wang

引用次数: 24

TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation 赋税姿态:机器人操作的任务特定交叉姿态估计

Conference on Robot Learning Pub Date : 2022-11-17 DOI: 10.48550/arXiv.2211.09325

Chuer Pan, Brian Okorn, Harry Zhang, Ben Eisner, David Held

{"title":"TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation","authors":"Chuer Pan, Brian Okorn, Harry Zhang, Ben Eisner, David Held","doi":"10.48550/arXiv.2211.09325","DOIUrl":"https://doi.org/10.48550/arXiv.2211.09325","url":null,"abstract":"How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship\"cross-pose\"and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126420955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields SE(3)-神经描述子域的等变关系重排

Conference on Robot Learning Pub Date : 2022-11-17 DOI: 10.48550/arXiv.2211.09786

A. Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, L. Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

{"title":"SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields","authors":"A. Simeonov, Yilun Du, Lin Yen-Chen, Alberto Rodriguez, L. Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal","doi":"10.48550/arXiv.2211.09786","DOIUrl":"https://doi.org/10.48550/arXiv.2211.09786","url":null,"abstract":"We present a method for performing tasks involving spatial relations between novel object instances initialized in arbitrary poses directly from point cloud observations. Our framework provides a scalable way for specifying new tasks using only 5-10 demonstrations. Object rearrangement is formalized as the question of finding actions that configure task-relevant parts of the object in a desired alignment. This formalism is implemented in three steps: assigning a consistent local coordinate frame to the task-relevant object parts, determining the location and orientation of this coordinate frame on unseen object instances, and executing an action that brings these frames into the desired alignment. We overcome the key technical challenge of determining task-relevant local coordinate frames from a few demonstrations by developing an optimization method based on Neural Descriptor Fields (NDFs) and a single annotated 3D keypoint. An energy-based learning scheme to model the joint configuration of the objects that satisfies a desired relational task further improves performance. The method is tested on three multi-object rearrangement tasks in simulation and on a real robot. Project website, videos, and code: https://anthonysimeonov.github.io/r-ndf/","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121298659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction 鲁棒轨迹预测的可解释自我意识神经网络

Conference on Robot Learning Pub Date : 2022-11-16 DOI: 10.48550/arXiv.2211.08701

Masha Itkina, Mykel J. Kochenderfer

{"title":"Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction","authors":"Masha Itkina, Mykel J. Kochenderfer","doi":"10.48550/arXiv.2211.08701","DOIUrl":"https://doi.org/10.48550/arXiv.2211.08701","url":null,"abstract":"Although neural networks have seen tremendous success as predictive models in a variety of domains, they can be overly confident in their predictions on out-of-distribution (OOD) data. To be viable for safety-critical applications, like autonomous vehicles, neural networks must accurately estimate their epistemic or model uncertainty, achieving a level of system self-awareness. Techniques for epistemic uncertainty quantification often require OOD data during training or multiple neural network forward passes during inference. These approaches may not be suitable for real-time performance on high-dimensional inputs. Furthermore, existing methods lack interpretability of the estimated uncertainty, which limits their usefulness both to engineers for further system development and to downstream modules in the autonomy stack. We propose the use of evidential deep learning to estimate the epistemic uncertainty over a low-dimensional, interpretable latent space in a trajectory prediction setting. We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among the semantic concepts: past agent behavior, road structure, and social context. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines. Our code is available at: https://github.com/sisl/InterpretableSelfAwarePrediction.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"61 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130949408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds ToolFlowNet:通过从点云预测工具流的工具机器人操作

Conference on Robot Learning Pub Date : 2022-11-16 DOI: 10.48550/arXiv.2211.09006

Daniel Seita, Yufei Wang, Sarthak J. Shetty, Edward Li, Zackory M. Erickson, David Held

引用次数: 15

Towards Long-Tailed 3D Detection 走向长尾三维探测

Conference on Robot Learning Pub Date : 2022-11-16 DOI: 10.48550/arXiv.2211.08691

Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong

{"title":"Towards Long-Tailed 3D Detection","authors":"Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong","doi":"10.48550/arXiv.2211.08691","DOIUrl":"https://doi.org/10.48550/arXiv.2211.08691","url":null,"abstract":"Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, AVs must still detect rare classes to ensure safe operation. Moreover, semantic classes are often organized within a hierarchy, e.g., tail classes such as child and construction-worker are arguably subclasses of pedestrian. However, such hierarchical relationships are often ignored, which may lead to misleading estimates of performance and missed opportunities for algorithmic innovation. We address these challenges by formally studying the problem of Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including those in-the-tail. We evaluate and innovate upon popular 3D detection codebases, such as CenterPoint and PointPillars, adapting them for LT3D. We develop hierarchical losses that promote feature sharing across common-vs-rare classes, as well as improved detection metrics that award partial credit to\"reasonable\"mistakes respecting the hierarchy (e.g., mistaking a child for an adult). Finally, we point out that fine-grained tail class accuracy is particularly improved via multimodal fusion of RGB images with LiDAR; simply put, small fine-grained classes are challenging to identify from sparse (lidar) geometry alone, suggesting that multimodal cues are crucial to long-tailed 3D detection. Our modifications improve accuracy by 5% AP on average for all classes, and dramatically improve AP for rare classes (e.g., stroller AP improves from 3.6 to 31.6)! Our code is available at https://github.com/neeharperi/LT3D","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134280266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Legged Locomotion in Challenging Terrains using Egocentric Vision 利用自我中心视觉在具有挑战性的地形中进行腿部运动

Conference on Robot Learning Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07638

Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak

{"title":"Legged Locomotion in Challenging Terrains using Egocentric Vision","authors":"Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak","doi":"10.48550/arXiv.2211.07638","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07638","url":null,"abstract":"Animals are capable of precise and agile locomotion using vision. Replicating this ability has been a long-standing goal in robotics. The traditional approach has been to decompose this problem into elevation mapping and foothold planning phases. The elevation mapping, however, is susceptible to failure and large noise artifacts, requires specialized hardware, and is biologically implausible. In this paper, we present the first end-to-end locomotion system capable of traversing stairs, curbs, stepping stones, and gaps. We show this result on a medium-sized quadruped robot using a single front-facing depth camera. The small size of the robot necessitates discovering specialized gait patterns not seen elsewhere. The egocentric camera requires the policy to remember past information to estimate the terrain under its hind feet. We train our policy in simulation. Training has two phases - first, we train a policy using reinforcement learning with a cheap-to-compute variant of depth image and then in phase 2 distill it into the final policy that uses depth using supervised learning. The resulting policy transfers to the real world and is able to run in real-time on the limited compute of the robot. It can traverse a large variety of terrain while being robust to perturbations like pushes, slippery surfaces, and rocky terrain. Videos are at https://vision-locomotion.github.io","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115365378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65