Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani
{"title":"InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation","authors":"Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani","doi":"arxiv-2409.07914","DOIUrl":"https://doi.org/arxiv-2409.07914","url":null,"abstract":"We present InterACT: Inter-dependency aware Action Chunking with Hierarchical\u0000Attention Transformers, a novel imitation learning framework for bimanual\u0000manipulation that integrates hierarchical attention to capture\u0000inter-dependencies between dual-arm joint states and visual inputs. InterACT\u0000consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both\u0000designed to enhance information aggregation and coordination. The encoder\u0000processes multi-modal inputs through segment-wise and cross-segment attention\u0000mechanisms, while the decoder leverages synchronization blocks to refine\u0000individual action predictions, providing the counterpart's prediction as\u0000context. Our experiments on a variety of simulated and real-world bimanual\u0000manipulation tasks demonstrate that InterACT significantly outperforms existing\u0000methods. Detailed ablation studies validate the contributions of key components\u0000of our work, including the impact of CLS tokens, cross-segment encoders, and\u0000synchronization blocks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao
{"title":"Universal Trajectory Optimization Framework for Differential-Driven Robot Class","authors":"Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao","doi":"arxiv-2409.07924","DOIUrl":"https://doi.org/arxiv-2409.07924","url":null,"abstract":"Differential-driven robots are widely used in various scenarios thanks to\u0000their straightforward principle, from household service robots to disaster\u0000response field robots. There are several different types of deriving mechanisms\u0000considering the real-world applications, including two-wheeled, four-wheeled\u0000skid-steering, tracked robots, etc. The differences in the driving mechanism\u0000usually require specific kinematic modeling when precise controlling is\u0000desired. Furthermore, the nonholonomic dynamics and possible lateral slip lead\u0000to different degrees of difficulty in getting feasible and high-quality\u0000trajectories. Therefore, a comprehensive trajectory optimization framework to\u0000compute trajectories efficiently for various kinds of differential-driven\u0000robots is highly desirable. In this paper, we propose a universal trajectory\u0000optimization framework that can be applied to differential-driven robot class,\u0000enabling the generation of high-quality trajectories within a restricted\u0000computational timeframe. We introduce a novel trajectory representation based\u0000on polynomial parameterization of motion states or their integrals, such as\u0000angular and linear velocities, that inherently matching robots' motion to the\u0000control principle for differential-driven robot class. The trajectory\u0000optimization problem is formulated to minimize complexity while prioritizing\u0000safety and operational efficiency. We then build a full-stack autonomous\u0000planning and control system to show the feasibility and robustness. We conduct\u0000extensive simulations and real-world testing in crowded environments with three\u0000kinds of differential-driven robots to validate the effectiveness of our\u0000approach. We will release our method as an open-source package.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan
{"title":"Graph Inspection for Robotic Motion Planning: Do Arithmetic Circuits Help?","authors":"Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan","doi":"arxiv-2409.08219","DOIUrl":"https://doi.org/arxiv-2409.08219","url":null,"abstract":"We investigate whether algorithms based on arithmetic circuits are a viable\u0000alternative to existing solvers for Graph Inspection, a problem with direct\u0000application in robotic motion planning. Specifically, we seek to address the\u0000high memory usage of existing solvers. Aided by novel theoretical results\u0000enabling fast solution recovery, we implement a circuit-based solver for Graph\u0000Inspection which uses only polynomial space and test it on several realistic\u0000robotic motion planning datasets. In particular, we provide a comprehensive\u0000experimental evaluation of a suite of engineered algorithms for three key\u0000subroutines. While this evaluation demonstrates that circuit-based methods are\u0000not yet practically competitive for our robotics application, it also provides\u0000insights which may guide future efforts to bring circuit-based algorithms from\u0000theory to practice.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Online Safety Corrections for Robotic Manipulation Policies","authors":"Ariana Spalter, Mark Roberts, Laura M. Hiatt","doi":"arxiv-2409.08233","DOIUrl":"https://doi.org/arxiv-2409.08233","url":null,"abstract":"Recent successes in applying reinforcement learning (RL) for robotics has\u0000shown it is a viable approach for constructing robotic controllers. However, RL\u0000controllers can produce many collisions in environments where new obstacles\u0000appear during execution. This poses a problem in safety-critical settings. We\u0000present a hybrid approach, called iKinQP-RL, that uses an Inverse Kinematics\u0000Quadratic Programming (iKinQP) controller to correct actions proposed by an RL\u0000policy at runtime. This ensures safe execution in the presence of new obstacles\u0000not present during training. Preliminary experiments illustrate our iKinQP-RL\u0000framework completely eliminates collisions with new obstacles while maintaining\u0000a high task success rate.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu
{"title":"Adaptive Language-Guided Abstraction from Contrastive Explanations","authors":"Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu","doi":"arxiv-2409.08212","DOIUrl":"https://doi.org/arxiv-2409.08212","url":null,"abstract":"Many approaches to robot learning begin by inferring a reward function from a\u0000set of human demonstrations. To learn a good reward, it is necessary to\u0000determine which features of the environment are relevant before determining how\u0000these features should be used to compute reward. End-to-end methods for joint\u0000feature and reward learning (e.g., using deep networks or program synthesis\u0000techniques) often yield brittle reward functions that are sensitive to spurious\u0000state features. By contrast, humans can often generalizably learn from a small\u0000number of demonstrations by incorporating strong priors about what features of\u0000a demonstration are likely meaningful for a task of interest. How do we build\u0000robots that leverage this kind of background knowledge when learning from new\u0000demonstrations? This paper describes a method named ALGAE (Adaptive\u0000Language-Guided Abstraction from [Contrastive] Explanations) which alternates\u0000between using language models to iteratively identify human-meaningful features\u0000needed to explain demonstrated behavior, then standard inverse reinforcement\u0000learning techniques to assign weights to these features. Experiments across a\u0000variety of both simulated and real-world robot environments show that ALGAE\u0000learns generalizable reward functions defined on interpretable features using\u0000only small numbers of demonstrations. Importantly, ALGAE can recognize when\u0000features are missing, then extract and define those features without any human\u0000input -- making it possible to quickly and efficiently acquire rich\u0000representations of user behavior.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable","authors":"Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, Matthieu Cord","doi":"arxiv-2409.07830","DOIUrl":"https://doi.org/arxiv-2409.07830","url":null,"abstract":"Machine learning based autonomous driving systems often face challenges with\u0000safety-critical scenarios that are rare in real-world data, hindering their\u0000large-scale deployment. While increasing real-world training data coverage\u0000could address this issue, it is costly and dangerous. This work explores\u0000generating safety-critical driving scenarios by modifying complex real-world\u0000regular scenarios through trajectory optimization. We propose ReGentS, which\u0000stabilizes generated trajectories and introduces heuristics to avoid obvious\u0000collisions and optimization problems. Our approach addresses unrealistic\u0000diverging trajectories and unavoidable collision scenarios that are not useful\u0000for training robust planner. We also extend the scenario generation framework\u0000to handle real-world data with up to 32 agents. Additionally, by using a\u0000differentiable simulator, our approach simplifies gradient descent-based\u0000optimization involving a simulator, paving the way for future advancements. The\u0000code is available at https://github.com/valeoai/ReGentS.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik
{"title":"Hand-Object Interaction Pretraining from Videos","authors":"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik","doi":"arxiv-2409.08273","DOIUrl":"https://doi.org/arxiv-2409.08273","url":null,"abstract":"We present an approach to learn general robot manipulation priors from 3D\u0000hand-object interaction trajectories. We build a framework to use in-the-wild\u0000videos to generate sensorimotor robot trajectories. We do so by lifting both\u0000the human hand and the manipulated object in a shared 3D space and retargeting\u0000human motions to robot actions. Generative modeling on this data gives us a\u0000task-agnostic base policy. This policy captures a general yet flexible\u0000manipulation prior. We empirically demonstrate that finetuning this policy,\u0000with both reinforcement learning (RL) and behavior cloning (BC), enables\u0000sample-efficient adaptation to downstream tasks and simultaneously improves\u0000robustness and generalizability compared to prior approaches. Qualitative\u0000experiments are available at: url{https://hgaurav2k.github.io/hop/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li
{"title":"Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes","authors":"Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li","doi":"arxiv-2409.07843","DOIUrl":"https://doi.org/arxiv-2409.07843","url":null,"abstract":"Omnidirectional Depth Estimation has broad application prospects in fields\u0000such as robotic navigation and autonomous driving. In this paper, we propose a\u0000robotic prototype system and corresponding algorithm designed to validate\u0000omnidirectional depth estimation for navigation and obstacle avoidance in\u0000real-world scenarios for both robots and vehicles. The proposed HexaMODE system\u0000captures 360$^circ$ depth maps using six surrounding arranged fisheye cameras.\u0000We introduce a combined spherical sweeping method and optimize the model\u0000architecture for proposed RtHexa-OmniMVS algorithm to achieve real-time\u0000omnidirectional depth estimation. To ensure high accuracy, robustness, and\u0000generalization in real-world environments, we employ a teacher-student\u0000self-training strategy, utilizing large-scale unlabeled real-world data for\u0000model training. The proposed algorithm demonstrates high accuracy in various\u0000complex real-world scenarios, both indoors and outdoors, achieving an inference\u0000speed of 15 fps on edge computing platforms.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer
{"title":"FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments","authors":"Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer","doi":"arxiv-2409.07715","DOIUrl":"https://doi.org/arxiv-2409.07715","url":null,"abstract":"Robust depth perception in visually-degraded environments is crucial for\u0000autonomous aerial systems. Thermal imaging cameras, which capture infrared\u0000radiation, are robust to visual degradation. However, due to lack of a\u0000large-scale dataset, the use of thermal cameras for unmanned aerial system\u0000(UAS) depth perception has remained largely unexplored. This paper presents a\u0000stereo thermal depth perception dataset for autonomous aerial perception\u0000applications. The dataset consists of stereo thermal images, LiDAR, IMU and\u0000ground truth depth maps captured in urban and forest settings under diverse\u0000conditions like day, night, rain, and smoke. We benchmark representative stereo\u0000depth estimation algorithms, offering insights into their performance in\u0000degraded conditions. Models trained on our dataset generalize well to unseen\u0000smoky conditions, highlighting the robustness of stereo thermal imaging for\u0000depth perception. We aim for this work to enhance robotic perception in\u0000disaster scenarios, allowing for exploration and operations in previously\u0000unreachable areas. The dataset and source code are available at\u0000https://firestereo.github.io.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi
{"title":"Relevance for Human Robot Collaboration","authors":"Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi","doi":"arxiv-2409.07753","DOIUrl":"https://doi.org/arxiv-2409.07753","url":null,"abstract":"Effective human-robot collaboration (HRC) requires the robots to possess\u0000human-like intelligence. Inspired by the human's cognitive ability to\u0000selectively process and filter elements in complex environments, this paper\u0000introduces a novel concept and scene-understanding approach termed `relevance.'\u0000It identifies relevant components in a scene. To accurately and efficiently\u0000quantify relevance, we developed an event-based framework that selectively\u0000triggers relevance determination, along with a probabilistic methodology built\u0000on a structured scene representation. Simulation results demonstrate that the\u0000relevance framework and methodology accurately predict the relevance of a\u0000general HRC setup, achieving a precision of 0.99 and a recall of 0.94.\u0000Relevance can be broadly applied to several areas in HRC to improve task\u0000planning time by 79.56% compared with pure planning for a cereal task, reduce\u0000perception latency by up to 26.53% for an object detector, improve HRC safety\u0000by up to 13.50% and reduce the number of inquiries for HRC by 75.36%. A\u0000real-world demonstration showcases the relevance framework's ability to\u0000intelligently assist humans in everyday tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}