Lei Cheng, Junpeng Hu, Haodong Yan, Mariia Gladkova, Tianyu Huang, Yun-Hui Liu, Daniel Cremers, Haoang Li
{"title":"Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments","authors":"Lei Cheng, Junpeng Hu, Haodong Yan, Mariia Gladkova, Tianyu Huang, Yun-Hui Liu, Daniel Cremers, Haoang Li","doi":"arxiv-2409.11854","DOIUrl":"https://doi.org/arxiv-2409.11854","url":null,"abstract":"Photometric bundle adjustment (PBA) is widely used in estimating the camera\u0000pose and 3D geometry by assuming a Lambertian world. However, the assumption of\u0000photometric consistency is often violated since the non-diffuse reflection is\u0000common in real-world environments. The photometric inconsistency significantly\u0000affects the reliability of existing PBA methods. To solve this problem, we\u0000propose a novel physically-based PBA method. Specifically, we introduce the\u0000physically-based weights regarding material, illumination, and light path.\u0000These weights distinguish the pixel pairs with different levels of photometric\u0000inconsistency. We also design corresponding models for material estimation\u0000based on sequential images and illumination estimation based on point clouds.\u0000In addition, we establish the first SLAM-related dataset of non-Lambertian\u0000scenes with complete ground truth of illumination and material. Extensive\u0000experiments demonstrated that our PBA method outperforms existing approaches in\u0000accuracy.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bundle Adjustment in the Eager Mode","authors":"Zitong Zhan, Huan Xu, Zihang Fang, Xinpeng Wei, Yaoyu Hu, Chen Wang","doi":"arxiv-2409.12190","DOIUrl":"https://doi.org/arxiv-2409.12190","url":null,"abstract":"Bundle adjustment (BA) is a critical technique in various robotic\u0000applications, such as simultaneous localization and mapping (SLAM), augmented\u0000reality (AR), and photogrammetry. BA optimizes parameters such as camera poses\u0000and 3D landmarks to align them with observations. With the growing importance\u0000of deep learning in perception systems, there is an increasing need to\u0000integrate BA with deep learning frameworks for enhanced reliability and\u0000performance. However, widely-used C++-based BA frameworks, such as GTSAM,\u0000g$^2$o, and Ceres, lack native integration with modern deep learning libraries\u0000like PyTorch. This limitation affects their flexibility, adaptability, ease of\u0000debugging, and overall implementation efficiency. To address this gap, we\u0000introduce an eager-mode BA framework seamlessly integrated with PyPose,\u0000providing PyTorch-compatible interfaces with high efficiency. Our approach\u0000includes GPU-accelerated, differentiable, and sparse operations designed for\u00002nd-order optimization, Lie group and Lie algebra operations, and linear\u0000solvers. Our eager-mode BA on GPU demonstrates substantial runtime efficiency,\u0000achieving an average speedup of 18.5$times$, 22$times$, and 23$times$\u0000compared to GTSAM, g$^2$o, and Ceres, respectively.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control","authors":"Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel Pinto","doi":"arxiv-2409.12192","DOIUrl":"https://doi.org/arxiv-2409.12192","url":null,"abstract":"Imitation learning has proven to be a powerful tool for training complex\u0000visuomotor policies. However, current methods often require hundreds to\u0000thousands of expert demonstrations to handle high-dimensional visual\u0000observations. A key reason for this poor data efficiency is that visual\u0000representations are predominantly either pretrained on out-of-domain data or\u0000trained directly through a behavior cloning objective. In this work, we present\u0000DynaMo, a new in-domain, self-supervised method for learning visual\u0000representations. Given a set of expert demonstrations, we jointly learn a\u0000latent inverse dynamics model and a forward dynamics model over a sequence of\u0000image embeddings, predicting the next frame in latent space, without\u0000augmentations, contrastive sampling, or access to ground truth actions.\u0000Importantly, DynaMo does not require any out-of-domain data such as Internet\u0000datasets or cross-embodied datasets. On a suite of six simulated and real\u0000environments, we show that representations learned with DynaMo significantly\u0000improve downstream imitation learning performance over prior self-supervised\u0000learning objectives, and pretrained representations. Gains from using DynaMo\u0000hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP,\u0000and nearest neighbors. Finally, we ablate over key components of DynaMo and\u0000measure its impact on downstream policy performance. Robot videos are best\u0000viewed at https://dynamo-ssl.github.io","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Global Localization using Multi-Modal Object-Instance Re-Identification","authors":"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna","doi":"arxiv-2409.12002","DOIUrl":"https://doi.org/arxiv-2409.12002","url":null,"abstract":"Re-identification (ReID) is a critical challenge in computer vision,\u0000predominantly studied in the context of pedestrians and vehicles. However,\u0000robust object-instance ReID, which has significant implications for tasks such\u0000as autonomous exploration, long-term perception, and scene understanding,\u0000remains underexplored. In this work, we address this gap by proposing a novel\u0000dual-path object-instance re-identification transformer architecture that\u0000integrates multimodal RGB and depth information. By leveraging depth data, we\u0000demonstrate improvements in ReID across scenes that are cluttered or have\u0000varying illumination conditions. Additionally, we develop a ReID-based\u0000localization framework that enables accurate camera localization and pose\u0000identification across different viewpoints. We validate our methods using two\u0000custom-built RGB-D datasets, as well as multiple sequences from the open-source\u0000TUM RGB-D datasets. Our approach demonstrates significant improvements in both\u0000object instance ReID (mAP of 75.18) and localization accuracy (success rate of\u000083% on TUM-RGBD), highlighting the essential role of object ReID in advancing\u0000robotic perception. Our models, frameworks, and datasets have been made\u0000publicly available.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jikai Ye, Wanze Li, Shiraz Khan, Gregory S. Chirikjian
{"title":"RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets","authors":"Jikai Ye, Wanze Li, Shiraz Khan, Gregory S. Chirikjian","doi":"arxiv-2409.11831","DOIUrl":"https://doi.org/arxiv-2409.11831","url":null,"abstract":"Cloth state estimation is an important problem in robotics. It is essential\u0000for the robot to know the accurate state to manipulate cloth and execute tasks\u0000such as robotic dressing, stitching, and covering/uncovering human beings.\u0000However, estimating cloth state accurately remains challenging due to its high\u0000flexibility and self-occlusion. This paper proposes a diffusion model-based\u0000pipeline that formulates the cloth state estimation as an image generation\u0000problem by representing the cloth state as an RGB image that describes the\u0000point-wise translation (translation map) between a pre-defined flattened mesh\u0000and the deformed mesh in a canonical space. Then we train a conditional\u0000diffusion-based image generation model to predict the translation map based on\u0000an observation. Experiments are conducted in both simulation and the real world\u0000to validate the performance of our method. Results indicate that our method\u0000outperforms two recent methods in both accuracy and speed.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cory M. Simon, Jeffrey Richley, Lucas Overbey, Darleen Perez-Lavin
{"title":"Bi-objective trail-planning for a robot team orienteering in a hazardous environment","authors":"Cory M. Simon, Jeffrey Richley, Lucas Overbey, Darleen Perez-Lavin","doi":"arxiv-2409.12114","DOIUrl":"https://doi.org/arxiv-2409.12114","url":null,"abstract":"Teams of mobile [aerial, ground, or aquatic] robots have applications in\u0000resource delivery, patrolling, information-gathering, agriculture, forest fire\u0000fighting, chemical plume source localization and mapping, and\u0000search-and-rescue. Robot teams traversing hazardous environments -- with e.g.\u0000rough terrain or seas, strong winds, or adversaries capable of attacking or\u0000capturing robots -- should plan and coordinate their trails in consideration of\u0000risks of disablement, destruction, or capture. Specifically, the robots should\u0000take the safest trails, coordinate their trails to cooperatively achieve the\u0000team-level objective with robustness to robot failures, and balance the reward\u0000from visiting locations against risks of robot losses. Herein, we consider\u0000bi-objective trail-planning for a mobile team of robots orienteering in a\u0000hazardous environment. The hazardous environment is abstracted as a directed\u0000graph whose arcs, when traversed by a robot, present known probabilities of\u0000survival. Each node of the graph offers a reward to the team if visited by a\u0000robot (which e.g. delivers a good to or images the node). We wish to search for\u0000the Pareto-optimal robot-team trail plans that maximize two [conflicting] team\u0000objectives: the expected (i) team reward and (ii) number of robots that survive\u0000the mission. A human decision-maker can then select trail plans that balance,\u0000according to their values, reward and robot survival. We implement ant colony\u0000optimization, guided by heuristics, to search for the Pareto-optimal set of\u0000robot team trail plans. As a case study, we illustrate with an\u0000information-gathering mission in an art museum.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
{"title":"AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots","authors":"Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li","doi":"arxiv-2409.11905","DOIUrl":"https://doi.org/arxiv-2409.11905","url":null,"abstract":"This paper presents AlignBot, a novel framework designed to optimize\u0000VLM-powered customized task planning for household robots by effectively\u0000aligning with user reminders. In domestic settings, aligning task planning with\u0000user reminders poses significant challenges due to the limited quantity,\u0000diversity, and multimodal nature of the reminders. To address these challenges,\u0000AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for\u0000GPT-4o. This adapter model internalizes diverse forms of user reminders-such as\u0000personalized preferences, corrective guidance, and contextual assistance-into\u0000structured instruction-formatted cues that prompt GPT-4o in generating\u0000customized task plans. Additionally, AlignBot integrates a dynamic retrieval\u0000mechanism that selects task-relevant historical successes as prompts for\u0000GPT-4o, further enhancing task planning accuracy. To validate the effectiveness\u0000of AlignBot, experiments are conducted in real-world household environments,\u0000which are constructed within the laboratory to replicate typical household\u0000settings. A multimodal dataset with over 1,500 entries derived from volunteer\u0000reminders is used for training and evaluation. The results demonstrate that\u0000AlignBot significantly improves customized task planning, outperforming\u0000existing LLM- and VLM-powered planners by interpreting and aligning with user\u0000reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline\u0000at 21.6%, reflecting a 65% improvement and over four times greater\u0000effectiveness. Supplementary materials are available at:\u0000https://yding25.com/AlignBot/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Bianchi Bazzi, Asad Ali Shahid, Christopher Agia, John Alora, Marco Forgione, Dario Piga, Francesco Braghin, Marco Pavone, Loris Roveda
{"title":"RoboMorph: In-Context Meta-Learning for Robot Dynamics Modeling","authors":"Manuel Bianchi Bazzi, Asad Ali Shahid, Christopher Agia, John Alora, Marco Forgione, Dario Piga, Francesco Braghin, Marco Pavone, Loris Roveda","doi":"arxiv-2409.11815","DOIUrl":"https://doi.org/arxiv-2409.11815","url":null,"abstract":"The landscape of Deep Learning has experienced a major shift with the\u0000pervasive adoption of Transformer-based architectures, particularly in Natural\u0000Language Processing (NLP). Novel avenues for physical applications, such as\u0000solving Partial Differential Equations and Image Vision, have been explored.\u0000However, in challenging domains like robotics, where high non-linearity poses\u0000significant challenges, Transformer-based applications are scarce. While\u0000Transformers have been used to provide robots with knowledge about high-level\u0000tasks, few efforts have been made to perform system identification. This paper\u0000proposes a novel methodology to learn a meta-dynamical model of a\u0000high-dimensional physical system, such as the Franka robotic arm, using a\u0000Transformer-based architecture without prior knowledge of the system's physical\u0000parameters. The objective is to predict quantities of interest (end-effector\u0000pose and joint positions) given the torque signals for each joint. This\u0000prediction can be useful as a component for Deep Model Predictive Control\u0000frameworks in robotics. The meta-model establishes the correlation between\u0000torques and positions and predicts the output for the complete trajectory. This\u0000work provides empirical evidence of the efficacy of the in-context learning\u0000paradigm, suggesting future improvements in learning the dynamics of robotic\u0000systems without explicit knowledge of physical parameters. Code, videos, and\u0000supplementary materials can be found at project website. See\u0000https://sites.google.com/view/robomorph/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human-Robot Cooperative Piano Playing with Learning-Based Real-Time Music Accompaniment","authors":"Huijiang Wang, Xiaoping Zhang, Fumiya Iida","doi":"arxiv-2409.11952","DOIUrl":"https://doi.org/arxiv-2409.11952","url":null,"abstract":"Recent advances in machine learning have paved the way for the development of\u0000musical and entertainment robots. However, human-robot cooperative instrument\u0000playing remains a challenge, particularly due to the intricate motor\u0000coordination and temporal synchronization. In this paper, we propose a\u0000theoretical framework for human-robot cooperative piano playing based on\u0000non-verbal cues. First, we present a music improvisation model that employs a\u0000recurrent neural network (RNN) to predict appropriate chord progressions based\u0000on the human's melodic input. Second, we propose a behavior-adaptive controller\u0000to facilitate seamless temporal synchronization, allowing the cobot to generate\u0000harmonious acoustics. The collaboration takes into account the bidirectional\u0000information flow between the human and robot. We have developed an\u0000entropy-based system to assess the quality of cooperation by analyzing the\u0000impact of different communication modalities during human-robot collaboration.\u0000Experiments demonstrate that our RNN-based improvisation can achieve a 93%\u0000accuracy rate. Meanwhile, with the MPC adaptive controller, the robot could\u0000respond to the human teammate in homophony performances with real-time\u0000accompaniment. Our designed framework has been validated to be effective in\u0000allowing humans and robots to work collaboratively in the artistic\u0000piano-playing task.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Saviolo, Niko Picello, Rishabh Verma, Giuseppe Loianno
{"title":"Reactive Collision Avoidance for Safe Agile Navigation","authors":"Alessandro Saviolo, Niko Picello, Rishabh Verma, Giuseppe Loianno","doi":"arxiv-2409.11962","DOIUrl":"https://doi.org/arxiv-2409.11962","url":null,"abstract":"Reactive collision avoidance is essential for agile robots navigating complex\u0000and dynamic environments, enabling real-time obstacle response. However, this\u0000task is inherently challenging because it requires a tight integration of\u0000perception, planning, and control, which traditional methods often handle\u0000separately, resulting in compounded errors and delays. This paper introduces a\u0000novel approach that unifies these tasks into a single reactive framework using\u0000solely onboard sensing and computing. Our method combines nonlinear model\u0000predictive control with adaptive control barrier functions, directly linking\u0000perception-driven constraints to real-time planning and control. Constraints\u0000are determined by using a neural network to refine noisy RGB-D data, enhancing\u0000depth accuracy, and selecting points with the minimum time-to-collision to\u0000prioritize the most immediate threats. To maintain a balance between safety and\u0000agility, a heuristic dynamically adjusts the optimization process, preventing\u0000overconstraints in real time. Extensive experiments with an agile quadrotor\u0000demonstrate effective collision avoidance across diverse indoor and outdoor\u0000environments, without requiring environment-specific tuning or explicit\u0000mapping.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}