arXiv - CS - Robotics最新文献

Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments 非朗伯环境中基于物理的光度测量光束调整

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11854

Lei Cheng, Junpeng Hu, Haodong Yan, Mariia Gladkova, Tianyu Huang, Yun-Hui Liu, Daniel Cremers, Haoang Li

引用次数: 0

Bundle Adjustment in the Eager Mode 急切模式下的捆绑调整

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.12190

Zitong Zhan, Huan Xu, Zihang Fang, Xinpeng Wei, Yaoyu Hu, Chen Wang

{"title":"Bundle Adjustment in the Eager Mode","authors":"Zitong Zhan, Huan Xu, Zihang Fang, Xinpeng Wei, Yaoyu Hu, Chen Wang","doi":"arxiv-2409.12190","DOIUrl":"https://doi.org/arxiv-2409.12190","url":null,"abstract":"Bundle adjustment (BA) is a critical technique in various robotic\u0000applications, such as simultaneous localization and mapping (SLAM), augmented\u0000reality (AR), and photogrammetry. BA optimizes parameters such as camera poses\u0000and 3D landmarks to align them with observations. With the growing importance\u0000of deep learning in perception systems, there is an increasing need to\u0000integrate BA with deep learning frameworks for enhanced reliability and\u0000performance. However, widely-used C++-based BA frameworks, such as GTSAM,\u0000g$^2$o, and Ceres, lack native integration with modern deep learning libraries\u0000like PyTorch. This limitation affects their flexibility, adaptability, ease of\u0000debugging, and overall implementation efficiency. To address this gap, we\u0000introduce an eager-mode BA framework seamlessly integrated with PyPose,\u0000providing PyTorch-compatible interfaces with high efficiency. Our approach\u0000includes GPU-accelerated, differentiable, and sparse operations designed for\u00002nd-order optimization, Lie group and Lie algebra operations, and linear\u0000solvers. Our eager-mode BA on GPU demonstrates substantial runtime efficiency,\u0000achieving an average speedup of 18.5$times$, 22$times$, and 23$times$\u0000compared to GTSAM, g$^2$o, and Ceres, respectively.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control DynaMo：针对视觉运动控制的域内动力学预训练

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.12192

Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto

{"title":"DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control","authors":"Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto","doi":"arxiv-2409.12192","DOIUrl":"https://doi.org/arxiv-2409.12192","url":null,"abstract":"Imitation learning has proven to be a powerful tool for training complex\u0000visuomotor policies. However, current methods often require hundreds to\u0000thousands of expert demonstrations to handle high-dimensional visual\u0000observations. A key reason for this poor data efficiency is that visual\u0000representations are predominantly either pretrained on out-of-domain data or\u0000trained directly through a behavior cloning objective. In this work, we present\u0000DynaMo, a new in-domain, self-supervised method for learning visual\u0000representations. Given a set of expert demonstrations, we jointly learn a\u0000latent inverse dynamics model and a forward dynamics model over a sequence of\u0000image embeddings, predicting the next frame in latent space, without\u0000augmentations, contrastive sampling, or access to ground truth actions.\u0000Importantly, DynaMo does not require any out-of-domain data such as Internet\u0000datasets or cross-embodied datasets. On a suite of six simulated and real\u0000environments, we show that representations learned with DynaMo significantly\u0000improve downstream imitation learning performance over prior self-supervised\u0000learning objectives, and pretrained representations. Gains from using DynaMo\u0000hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP,\u0000and nearest neighbors. Finally, we ablate over key components of DynaMo and\u0000measure its impact on downstream policy performance. Robot videos are best\u0000viewed at https://dynamo-ssl.github.io","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Global Localization using Multi-Modal Object-Instance Re-Identification 利用多模式物体-实例再识别技术实现全球本地化

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.12002

Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna

{"title":"Towards Global Localization using Multi-Modal Object-Instance Re-Identification","authors":"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna","doi":"arxiv-2409.12002","DOIUrl":"https://doi.org/arxiv-2409.12002","url":null,"abstract":"Re-identification (ReID) is a critical challenge in computer vision,\u0000predominantly studied in the context of pedestrians and vehicles. However,\u0000robust object-instance ReID, which has significant implications for tasks such\u0000as autonomous exploration, long-term perception, and scene understanding,\u0000remains underexplored. In this work, we address this gap by proposing a novel\u0000dual-path object-instance re-identification transformer architecture that\u0000integrates multimodal RGB and depth information. By leveraging depth data, we\u0000demonstrate improvements in ReID across scenes that are cluttered or have\u0000varying illumination conditions. Additionally, we develop a ReID-based\u0000localization framework that enables accurate camera localization and pose\u0000identification across different viewpoints. We validate our methods using two\u0000custom-built RGB-D datasets, as well as multiple sequences from the open-source\u0000TUM RGB-D datasets. Our approach demonstrates significant improvements in both\u0000object instance ReID (mAP of 75.18) and localization accuracy (success rate of\u000083% on TUM-RGBD), highlighting the essential role of object ReID in advancing\u0000robotic perception. Our models, frameworks, and datasets have been made\u0000publicly available.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets RaggeDi：基于扩散的无序抹布、床单、毛巾和毯子的状态估计

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11831

Jikai Ye, Wanze Li, Shiraz Khan, Gregory S. Chirikjian

引用次数: 0

Bi-objective trail-planning for a robot team orienteering in a hazardous environment 机器人团队在危险环境中定向越野的双目标路径规划

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.12114

Cory M. Simon, Jeffrey Richley, Lucas Overbey, Darleen Perez-Lavin

{"title":"Bi-objective trail-planning for a robot team orienteering in a hazardous environment","authors":"Cory M. Simon, Jeffrey Richley, Lucas Overbey, Darleen Perez-Lavin","doi":"arxiv-2409.12114","DOIUrl":"https://doi.org/arxiv-2409.12114","url":null,"abstract":"Teams of mobile [aerial, ground, or aquatic] robots have applications in\u0000resource delivery, patrolling, information-gathering, agriculture, forest fire\u0000fighting, chemical plume source localization and mapping, and\u0000search-and-rescue. Robot teams traversing hazardous environments -- with e.g.\u0000rough terrain or seas, strong winds, or adversaries capable of attacking or\u0000capturing robots -- should plan and coordinate their trails in consideration of\u0000risks of disablement, destruction, or capture. Specifically, the robots should\u0000take the safest trails, coordinate their trails to cooperatively achieve the\u0000team-level objective with robustness to robot failures, and balance the reward\u0000from visiting locations against risks of robot losses. Herein, we consider\u0000bi-objective trail-planning for a mobile team of robots orienteering in a\u0000hazardous environment. The hazardous environment is abstracted as a directed\u0000graph whose arcs, when traversed by a robot, present known probabilities of\u0000survival. Each node of the graph offers a reward to the team if visited by a\u0000robot (which e.g. delivers a good to or images the node). We wish to search for\u0000the Pareto-optimal robot-team trail plans that maximize two [conflicting] team\u0000objectives: the expected (i) team reward and (ii) number of robots that survive\u0000the mission. A human decision-maker can then select trail plans that balance,\u0000according to their values, reward and robot survival. We implement ant colony\u0000optimization, guided by heuristics, to search for the Pareto-optimal set of\u0000robot team trail plans. As a case study, we illustrate with an\u0000information-gathering mission in an art museum.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots AlignBot：通过对家用机器人进行微调，使 VLM 驱动的定制任务规划与用户提醒相一致

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11905

Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

{"title":"AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots","authors":"Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li","doi":"arxiv-2409.11905","DOIUrl":"https://doi.org/arxiv-2409.11905","url":null,"abstract":"This paper presents AlignBot, a novel framework designed to optimize\u0000VLM-powered customized task planning for household robots by effectively\u0000aligning with user reminders. In domestic settings, aligning task planning with\u0000user reminders poses significant challenges due to the limited quantity,\u0000diversity, and multimodal nature of the reminders. To address these challenges,\u0000AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for\u0000GPT-4o. This adapter model internalizes diverse forms of user reminders-such as\u0000personalized preferences, corrective guidance, and contextual assistance-into\u0000structured instruction-formatted cues that prompt GPT-4o in generating\u0000customized task plans. Additionally, AlignBot integrates a dynamic retrieval\u0000mechanism that selects task-relevant historical successes as prompts for\u0000GPT-4o, further enhancing task planning accuracy. To validate the effectiveness\u0000of AlignBot, experiments are conducted in real-world household environments,\u0000which are constructed within the laboratory to replicate typical household\u0000settings. A multimodal dataset with over 1,500 entries derived from volunteer\u0000reminders is used for training and evaluation. The results demonstrate that\u0000AlignBot significantly improves customized task planning, outperforming\u0000existing LLM- and VLM-powered planners by interpreting and aligning with user\u0000reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline\u0000at 21.6%, reflecting a 65% improvement and over four times greater\u0000effectiveness. Supplementary materials are available at:\u0000https://yding25.com/AlignBot/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RoboMorph: In-Context Meta-Learning for Robot Dynamics Modeling 机器人变形：机器人动力学建模的上下文元学习

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11815

Manuel Bianchi Bazzi, Asad Ali Shahid, Christopher Agia, John Alora, Marco Forgione, Dario Piga, Francesco Braghin, Marco Pavone, Loris Roveda

{"title":"RoboMorph: In-Context Meta-Learning for Robot Dynamics Modeling","authors":"Manuel Bianchi Bazzi, Asad Ali Shahid, Christopher Agia, John Alora, Marco Forgione, Dario Piga, Francesco Braghin, Marco Pavone, Loris Roveda","doi":"arxiv-2409.11815","DOIUrl":"https://doi.org/arxiv-2409.11815","url":null,"abstract":"The landscape of Deep Learning has experienced a major shift with the\u0000pervasive adoption of Transformer-based architectures, particularly in Natural\u0000Language Processing (NLP). Novel avenues for physical applications, such as\u0000solving Partial Differential Equations and Image Vision, have been explored.\u0000However, in challenging domains like robotics, where high non-linearity poses\u0000significant challenges, Transformer-based applications are scarce. While\u0000Transformers have been used to provide robots with knowledge about high-level\u0000tasks, few efforts have been made to perform system identification. This paper\u0000proposes a novel methodology to learn a meta-dynamical model of a\u0000high-dimensional physical system, such as the Franka robotic arm, using a\u0000Transformer-based architecture without prior knowledge of the system's physical\u0000parameters. The objective is to predict quantities of interest (end-effector\u0000pose and joint positions) given the torque signals for each joint. This\u0000prediction can be useful as a component for Deep Model Predictive Control\u0000frameworks in robotics. The meta-model establishes the correlation between\u0000torques and positions and predicts the output for the complete trajectory. This\u0000work provides empirical evidence of the efficacy of the in-context learning\u0000paradigm, suggesting future improvements in learning the dynamics of robotic\u0000systems without explicit knowledge of physical parameters. Code, videos, and\u0000supplementary materials can be found at project website. See\u0000https://sites.google.com/view/robomorph/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human-Robot Cooperative Piano Playing with Learning-Based Real-Time Music Accompaniment 利用基于学习的实时音乐伴奏进行人机合作钢琴演奏

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11952

Huijiang Wang, Xiaoping Zhang, Fumiya Iida

{"title":"Human-Robot Cooperative Piano Playing with Learning-Based Real-Time Music Accompaniment","authors":"Huijiang Wang, Xiaoping Zhang, Fumiya Iida","doi":"arxiv-2409.11952","DOIUrl":"https://doi.org/arxiv-2409.11952","url":null,"abstract":"Recent advances in machine learning have paved the way for the development of\u0000musical and entertainment robots. However, human-robot cooperative instrument\u0000playing remains a challenge, particularly due to the intricate motor\u0000coordination and temporal synchronization. In this paper, we propose a\u0000theoretical framework for human-robot cooperative piano playing based on\u0000non-verbal cues. First, we present a music improvisation model that employs a\u0000recurrent neural network (RNN) to predict appropriate chord progressions based\u0000on the human's melodic input. Second, we propose a behavior-adaptive controller\u0000to facilitate seamless temporal synchronization, allowing the cobot to generate\u0000harmonious acoustics. The collaboration takes into account the bidirectional\u0000information flow between the human and robot. We have developed an\u0000entropy-based system to assess the quality of cooperation by analyzing the\u0000impact of different communication modalities during human-robot collaboration.\u0000Experiments demonstrate that our RNN-based improvisation can achieve a 93%\u0000accuracy rate. Meanwhile, with the MPC adaptive controller, the robot could\u0000respond to the human teammate in homophony performances with real-time\u0000accompaniment. Our designed framework has been validated to be effective in\u0000allowing humans and robots to work collaboratively in the artistic\u0000piano-playing task.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reactive Collision Avoidance for Safe Agile Navigation 用于安全敏捷导航的反应式防撞系统

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI: arxiv-2409.11962

Alessandro Saviolo, Niko Picello, Rishabh Verma, Giuseppe Loianno

{"title":"Reactive Collision Avoidance for Safe Agile Navigation","authors":"Alessandro Saviolo, Niko Picello, Rishabh Verma, Giuseppe Loianno","doi":"arxiv-2409.11962","DOIUrl":"https://doi.org/arxiv-2409.11962","url":null,"abstract":"Reactive collision avoidance is essential for agile robots navigating complex\u0000and dynamic environments, enabling real-time obstacle response. However, this\u0000task is inherently challenging because it requires a tight integration of\u0000perception, planning, and control, which traditional methods often handle\u0000separately, resulting in compounded errors and delays. This paper introduces a\u0000novel approach that unifies these tasks into a single reactive framework using\u0000solely onboard sensing and computing. Our method combines nonlinear model\u0000predictive control with adaptive control barrier functions, directly linking\u0000perception-driven constraints to real-time planning and control. Constraints\u0000are determined by using a neural network to refine noisy RGB-D data, enhancing\u0000depth accuracy, and selecting points with the minimum time-to-collision to\u0000prioritize the most immediate threats. To maintain a balance between safety and\u0000agility, a heuristic dynamically adjusts the optimization process, preventing\u0000overconstraints in real time. Extensive experiments with an agile quadrotor\u0000demonstrate effective collision avoidance across diverse indoor and outdoor\u0000environments, without requiring environment-specific tuning or explicit\u0000mapping.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0