{"title":"Open-Set Semantic Uncertainty Aware Metric-Semantic Graph Matching","authors":"Kurran Singh, John J. Leonard","doi":"arxiv-2409.11555","DOIUrl":"https://doi.org/arxiv-2409.11555","url":null,"abstract":"Underwater object-level mapping requires incorporating visual foundation\u0000models to handle the uncommon and often previously unseen object classes\u0000encountered in marine scenarios. In this work, a metric of semantic uncertainty\u0000for open-set object detections produced by visual foundation models is\u0000calculated and then incorporated into an object-level uncertainty tracking\u0000framework. Object-level uncertainties and geometric relationships between\u0000objects are used to enable robust object-level loop closure detection for\u0000unknown object classes. The above loop closure detection problem is formulated\u0000as a graph-matching problem. While graph matching, in general, is NP-Complete,\u0000a solver for an equivalent formulation of the proposed graph matching problem\u0000as a graph editing problem is tested on multiple challenging underwater scenes.\u0000Results for this solver as well as three other solvers demonstrate that the\u0000proposed methods are feasible for real-time use in marine environments for the\u0000robust, open-set, multi-object, semantic-uncertainty-aware loop closure\u0000detection. Further experimental results on the KITTI dataset demonstrate that\u0000the method generalizes to large-scale terrestrial scenes.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning","authors":"Kathakoli Sengupta, Zhongkai Shagguan, Sandesh Bharadwaj, Sanjay Arora, Eshed Ohn-Bar, Renato Mancuso","doi":"arxiv-2409.11403","DOIUrl":"https://doi.org/arxiv-2409.11403","url":null,"abstract":"Embodied vision-based real-world systems, such as mobile robots, require a\u0000careful balance between energy consumption, compute latency, and safety\u0000constraints to optimize operation across dynamic tasks and contexts. As local\u0000computation tends to be restricted, offloading the computation, ie, to a remote\u0000server, can save local resources while providing access to high-quality\u0000predictions from powerful and large models. However, the resulting\u0000communication and latency overhead has led to limited usability of cloud models\u0000in dynamic, safety-critical, real-time settings. To effectively address this\u0000trade-off, we introduce UniLCD, a novel hybrid inference framework for enabling\u0000flexible local-cloud collaboration. By efficiently optimizing a flexible\u0000routing module via reinforcement learning and a suitable multi-task objective,\u0000UniLCD is specifically designed to support the multiple constraints of\u0000safety-critical end-to-end mobile systems. We validate the proposed approach\u0000using a challenging, crowded navigation task requiring frequent and timely\u0000switching between local and cloud operations. UniLCD demonstrates improved\u0000overall performance and efficiency, by over 35% compared to state-of-the-art\u0000baselines based on various split computing and early exit strategies.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Butterfield, Sandilya Sai Garimella, Nai-Jen Cheng, Lu Gan
{"title":"MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network for Legged Robot Contact Perception","authors":"Daniel Butterfield, Sandilya Sai Garimella, Nai-Jen Cheng, Lu Gan","doi":"arxiv-2409.11146","DOIUrl":"https://doi.org/arxiv-2409.11146","url":null,"abstract":"We present a Morphology-Informed Heterogeneous Graph Neural Network (MI-HGNN)\u0000for learning-based contact perception. The architecture and connectivity of the\u0000MI-HGNN are constructed from the robot morphology, in which nodes and edges are\u0000robot joints and links, respectively. By incorporating the morphology-informed\u0000constraints into a neural network, we improve a learning-based approach using\u0000model-based knowledge. We apply the proposed MI-HGNN to two contact perception\u0000problems, and conduct extensive experiments using both real-world and simulated\u0000data collected using two quadruped robots. Our experiments demonstrate the\u0000superiority of our method in terms of effectiveness, generalization ability,\u0000model efficiency, and sample efficiency. Our MI-HGNN improved the performance\u0000of a state-of-the-art model that leverages robot morphological symmetry by 8.4%\u0000with only 0.21% of its parameters. Although MI-HGNN is applied to contact\u0000perception problems for legged robots in this work, it can be seamlessly\u0000applied to other types of multi-body dynamical systems and has the potential to\u0000improve other robot learning frameworks. Our code is made publicly available at\u0000https://github.com/lunarlab-gatech/Morphology-Informed-HGNN.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhixing Hou, Maoxu Gao, Hang Yu, Mengyu Yang, Chio-In Ieong
{"title":"SDP: Spiking Diffusion Policy for Robotic Manipulation with Learnable Channel-Wise Membrane Thresholds","authors":"Zhixing Hou, Maoxu Gao, Hang Yu, Mengyu Yang, Chio-In Ieong","doi":"arxiv-2409.11195","DOIUrl":"https://doi.org/arxiv-2409.11195","url":null,"abstract":"This paper introduces a Spiking Diffusion Policy (SDP) learning method for\u0000robotic manipulation by integrating Spiking Neurons and Learnable Channel-wise\u0000Membrane Thresholds (LCMT) into the diffusion policy model, thereby enhancing\u0000computational efficiency and achieving high performance in evaluated tasks.\u0000Specifically, the proposed SDP model employs the U-Net architecture as the\u0000backbone for diffusion learning within the Spiking Neural Network (SNN). It\u0000strategically places residual connections between the spike convolution\u0000operations and the Leaky Integrate-and-Fire (LIF) nodes, thereby preventing\u0000disruptions to the spiking states. Additionally, we introduce a temporal\u0000encoding block and a temporal decoding block to transform static and dynamic\u0000data with timestep $T_S$ into each other, enabling the transmission of data\u0000within the SNN in spike format. Furthermore, we propose LCMT to enable the\u0000adaptive acquisition of membrane potential thresholds, thereby matching the\u0000conditions of varying membrane potentials and firing rates across channels and\u0000avoiding the cumbersome process of manually setting and tuning hyperparameters.\u0000Evaluating the SDP model on seven distinct tasks with SNN timestep $T_S=4$, we\u0000achieve results comparable to those of the ANN counterparts, along with faster\u0000convergence speeds than the baseline SNN method. This improvement is\u0000accompanied by a reduction of 94.3% in dynamic energy consumption estimated on\u000045nm hardware.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yihong Xu, Victor Letzelter, Mickaël Chen, Éloi Zablocki, Matthieu Cord
{"title":"Annealed Winner-Takes-All for Motion Forecasting","authors":"Yihong Xu, Victor Letzelter, Mickaël Chen, Éloi Zablocki, Matthieu Cord","doi":"arxiv-2409.11172","DOIUrl":"https://doi.org/arxiv-2409.11172","url":null,"abstract":"In autonomous driving, motion prediction aims at forecasting the future\u0000trajectories of nearby agents, helping the ego vehicle to anticipate behaviors\u0000and drive safely. A key challenge is generating a diverse set of future\u0000predictions, commonly addressed using data-driven models with Multiple Choice\u0000Learning (MCL) architectures and Winner-Takes-All (WTA) training objectives.\u0000However, these methods face initialization sensitivity and training\u0000instabilities. Additionally, to compensate for limited performance, some\u0000approaches rely on training with a large set of hypotheses, requiring a\u0000post-selection step during inference to significantly reduce the number of\u0000predictions. To tackle these issues, we take inspiration from annealed MCL, a\u0000recently introduced technique that improves the convergence properties of MCL\u0000methods through an annealed Winner-Takes-All loss (aWTA). In this paper, we\u0000demonstrate how the aWTA loss can be integrated with state-of-the-art motion\u0000forecasting models to enhance their performance using only a minimal set of\u0000hypotheses, eliminating the need for the cumbersome post-selection step. Our\u0000approach can be easily incorporated into any trajectory prediction model\u0000normally trained using WTA and yields significant improvements. To facilitate\u0000the application of our approach to future motion forecasting models, the code\u0000will be made publicly available upon acceptance:\u0000https://github.com/valeoai/MF_aWTA.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The 1st InterAI Workshop: Interactive AI for Human-centered Robotics","authors":"Yuchong Zhang, Elmira Yadollahi, Yong Ma, Di Fu, Iolanda Leite, Danica Kragic","doi":"arxiv-2409.11150","DOIUrl":"https://doi.org/arxiv-2409.11150","url":null,"abstract":"The workshop is affiliated with 33nd IEEE International Conference on Robot\u0000and Human Interactive Communication (RO-MAN 2024) August 26~30, 2023 /\u0000Pasadena, CA, USA. It is designed as a half-day event, extending over four\u0000hours from 9:00 to 12:30 PST time. It accommodates both in-person and virtual\u0000attendees (via Zoom), ensuring a flexible participation mode. The agenda is\u0000thoughtfully crafted to include a diverse range of sessions: two keynote\u0000speeches that promise to provide insightful perspectives, two dedicated paper\u0000presentation sessions, an interactive panel discussion to foster dialogue among\u0000experts which facilitates deeper dives into specific topics, and a 15-minute\u0000coffee break. The workshop website:\u0000https://sites.google.com/view/interaiworkshops/home.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Botao He, Guofei Chen, Cornelia Fermuller, Yiannis Aloimonos, Ji Zhang
{"title":"Air-FAR: Fast and Adaptable Routing for Aerial Navigation in Large-scale Complex Unknown Environments","authors":"Botao He, Guofei Chen, Cornelia Fermuller, Yiannis Aloimonos, Ji Zhang","doi":"arxiv-2409.11188","DOIUrl":"https://doi.org/arxiv-2409.11188","url":null,"abstract":"This paper presents a novel method for real-time 3D navigation in\u0000large-scale, complex environments using a hierarchical 3D visibility graph\u0000(V-graph). The proposed algorithm addresses the computational challenges of\u0000V-graph construction and shortest path search on the graph simultaneously. By\u0000introducing hierarchical 3D V-graph construction with heuristic visibility\u0000update, the 3D V-graph is constructed in O(K*n^2logn) time, which guarantees\u0000real-time performance. The proposed iterative divide-and-conquer path search\u0000method can achieve near-optimal path solutions within the constraints of\u0000real-time operations. The algorithm ensures efficient 3D V-graph construction\u0000and path search. Extensive simulated and real-world environments validated that\u0000our algorithm reduces the travel time by 42%, achieves up to 24.8% higher\u0000trajectory efficiency, and runs faster than most benchmarks by orders of\u0000magnitude in complex environments. The code and developed simulator have been\u0000open-sourced to facilitate future research.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaustubh Joshi, Tianchen Liu, Alan Williams, Matthew Gray, Xiaomin Lin, Nikhil Chopra
{"title":"3D Water Quality Mapping using Invariant Extended Kalman Filtering for Underwater Robot Localization","authors":"Kaustubh Joshi, Tianchen Liu, Alan Williams, Matthew Gray, Xiaomin Lin, Nikhil Chopra","doi":"arxiv-2409.11578","DOIUrl":"https://doi.org/arxiv-2409.11578","url":null,"abstract":"Water quality mapping for critical parameters such as temperature, salinity,\u0000and turbidity is crucial for assessing an aquaculture farm's health and yield\u0000capacity. Traditional approaches involve using boats or human divers, which are\u0000time-constrained and lack depth variability. This work presents an innovative\u0000approach to 3D water quality mapping in shallow water environments using a\u0000BlueROV2 equipped with GPS and a water quality sensor. This system allows for\u0000accurate location correction by resurfacing when errors occur. This study is\u0000being conducted at an oyster farm in the Chesapeake Bay, USA, providing a more\u0000comprehensive and precise water quality analysis in aquaculture settings.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones","authors":"Peihan Li, Yuwei Wu, Jiazhen Liu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou","doi":"arxiv-2409.11230","DOIUrl":"https://doi.org/arxiv-2409.11230","url":null,"abstract":"Multi-robot collaboration for target tracking presents significant challenges\u0000in hazardous environments, including addressing robot failures, dynamic\u0000priority changes, and other unpredictable factors. Moreover, these challenges\u0000are increased in adversarial settings if the environment is unknown. In this\u0000paper, we propose a resilient and adaptive framework for multi-robot,\u0000multi-target tracking in environments with unknown sensing and communication\u0000danger zones. The damages posed by these zones are temporary, allowing robots\u0000to track targets while accepting the risk of entering dangerous areas. We\u0000formulate the problem as an optimization with soft chance constraints, enabling\u0000real-time adjustments to robot behavior based on varying types of dangers and\u0000failures. An adaptive replanning strategy is introduced, featuring different\u0000triggers to improve group performance. This approach allows for dynamic\u0000prioritization of target tracking and risk aversion or resilience, depending on\u0000evolving resources and real-time conditions. To validate the effectiveness of\u0000the proposed method, we benchmark and evaluate it across multiple scenarios in\u0000simulation and conduct several real-world experiments.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VertiEncoder: Self-Supervised Kinodynamic Representation Learning on Vertically Challenging Terrain","authors":"Mohammad Nazeri, Aniket Datar, Anuj Pokhrel, Chenhui Pan, Garrett Warnell, Xuesu Xiao","doi":"arxiv-2409.11570","DOIUrl":"https://doi.org/arxiv-2409.11570","url":null,"abstract":"We present VertiEncoder, a self-supervised representation learning approach\u0000for robot mobility on vertically challenging terrain. Using the same\u0000pre-training process, VertiEncoder can handle four different downstream tasks,\u0000including forward kinodynamics learning, inverse kinodynamics learning,\u0000behavior cloning, and patch reconstruction with a single representation.\u0000VertiEncoder uses a TransformerEncoder to learn the local context of its\u0000surroundings by random masking and next patch reconstruction. We show that\u0000VertiEncoder achieves better performance across all four different tasks\u0000compared to specialized End-to-End models with 77% fewer parameters. We also\u0000show VertiEncoder's comparable performance against state-of-the-art kinodynamic\u0000modeling and planning approaches in real-world robot deployment. These results\u0000underscore the efficacy of VertiEncoder in mitigating overfitting and fostering\u0000more robust generalization across diverse environmental contexts and downstream\u0000vehicle kinodynamic tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}