{"title":"Visual-Inertial Localization Leveraging Skylight Polarization Pattern Constraints","authors":"Zhenhua Wan;Peng Fu;Kunfeng Wang;Kaichun Zhao","doi":"10.1109/LRA.2024.3495375","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495375","url":null,"abstract":"In this letter, we develop a tightly coupled polarization-visual-inertial localization system that utilizes naturally-attributed polarized skylight to provide a global heading. We introduce a focal plane polarization camera with negligible instantaneous field-of-view error to collect polarized skylight. Then, we design a robust heading determination method from polarized skylight and construct a global stable heading constraint. In particular, this constraint compensates for the heading unobservability present in standard VINS. In addition to the standard sparse visual feature measurements used in VINS, polarization heading residuals are constructed and co-optimized in a tightly-coupled VINS update. An adaptive fusion strategy is designed to correct the cumulative drift. Outdoor real-world experiments show that the proposed method outperforms state-of-the-art VINS-Fusion in terms of localization accuracy, and improves 22% over VINS-Fusion in a wooded campus environment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11481-11488"},"PeriodicalIF":4.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator","authors":"Xinkai Kuang;Runxin Niu;Chen Hua;Chunmao Jiang;Hui Zhu;Ziyu Chen;Biao Yu","doi":"10.1109/LRA.2024.3495376","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495376","url":null,"abstract":"Bird's eye view (BEV) segmentation map is a recent development in autonomous driving that provides effective environmental information, such as drivable areas and lane dividers. Most of the existing methods use cameras and LiDAR as inputs for segmentation and the fusion of different modalities is accomplished through either concatenation or addition operations, which fails to exploit fully the correlation and complementarity between modalities. This letter presents the CMGFA (Cross-Modal Group-mix attention Feature Aggregator), an end-to-end learning framework that can adapt to multiple modal feature combinations for BEV segmentation. The CMGFA comprises the following components: i) The camera has a dual-branch structure that strengthens the linkage between local and global features. ii) Multi-head deformable cross-attention is applied as cross-modal feature aggregators to aggregate camera, LiDAR, and Radar feature maps in BEV for implicit fusion. iii) The Group-Mix attention is used to enrich the attention map feature space and enhance the ability to segment between different categories. We evaluate our proposed method on the nuScenes and Argoverse2 datasets, where the CMGFA significantly outperforms the baseline.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11497-11504"},"PeriodicalIF":4.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual Obstacles Regulation for Multi-Agent Path Finding","authors":"Sike Zeng;Xi Chen;Li Chai","doi":"10.1109/LRA.2024.3494653","DOIUrl":"https://doi.org/10.1109/LRA.2024.3494653","url":null,"abstract":"Multi-agent path finding (MAPF) involves finding collision-free paths for multiple agents while minimizing the total path costs. Explicit estimation conflict-based search (EECBS) represents a state-of-the-art variant of the widely used conflict-based search (CBS) method, offering bounded-suboptimal solutions. However, both CBS and its variants rely on pairwise conflict resolution methods. A conflict boom means many conflicts occur at one location, which frequently exists in scenarios that a large number of agents operate in small space, and usually leads to heavy computational burden. The location that conflict boom occurs is regarded as conflict boom vertex. This letter proposes a novel method, the Virtual Obstacles Regulation, to expedite algorithmic solving processes (such as EECBS) for MAPF. The proposed method identifies conflicts boom vertices and strategically regulates them as global or local virtual obstacles to circumvent concentrated conflicts. Then, the pairwise conflict resolution processes on conflicts boom vertices are significantly simplified, hence accelerating overall algorithm runtime–often dominated by conflict resolution. Numerical studies validate the efficacy of this approach.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11417-11424"},"PeriodicalIF":4.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shape Visual Servoing of a Cable Suspended Between Two Drones","authors":"Lev Smolentsev;Alexandre Krupa;François Chaumette","doi":"10.1109/LRA.2024.3494655","DOIUrl":"https://doi.org/10.1109/LRA.2024.3494655","url":null,"abstract":"In this letter, we propose a shape visual servoing approach for manipulating a suspended cable attached between two quadrotor drones. A leader-follower control strategy is presented, where a human operator controls the rigid motion of the cable by teleoperating one drone (the leader), while the second drone (the follower) performs a shape visual servoing task to autonomously apply a desired deformation to the cable. The proposed shape visual servoing approach uses an RGB-D camera embedded on the follower drone and has the advantage to rely on a simple geometrical model of the cable that only requires the knowledge of its length. In the same time, our control strategy maintains the best visibility of the cable in the camera field of view. A robust image processing pipeline allows detecting and tracking in real-time the cable shape from the data provided by the onboard RGB-D camera. Experimental results demonstrate the effectiveness of the proposed visual control approach to shape a flexible cable into a desired shape. In addition, we demonstrate experimentally that such system can be used to perform an aerial transport task by grasping with the cable an object fitted with a hook, then moving and releasing it at another location.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11473-11480"},"PeriodicalIF":4.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dynamic Calibration Framework for the Event-Frame Stereo Camera System","authors":"Rui Hu;Jürgen Kogler;Margrit Gelautz;Min Lin;Yuanqing Xia","doi":"10.1109/LRA.2024.3491426","DOIUrl":"https://doi.org/10.1109/LRA.2024.3491426","url":null,"abstract":"The fusion of event cameras and conventional frame cameras is a novel research field, and a stereo structure consisting of an event camera and a frame camera can incorporate the advantages of both. This letter develops a dynamic calibration framework for the event-frame stereo camera system. In this framework, the first step is to complete the initial detection on a circle-grid calibration pattern, and a sliding-window time matching method is proposed to match the event-frame pairs. Then, a refining method is devised for two cameras to get the accurate information of the pattern. Particularly, for the event camera, a patch-size motion compensation method with high computational efficiency is designed to achieve time synchronization for two cameras and fit circles in an image of warped events. Finally, the pose between two cameras is globally optimized by constructing a pose-landmark graph with two types of edges. The proposed calibration framework has the advantages of high real-time performance and easy deployment, and its effectiveness is verified by experiments based on self-recorded datasets.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11465-11472"},"PeriodicalIF":4.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Zermane;Léo Moussafir;Youcan Yan;Abderrahmane Kheddar
{"title":"Minimal Impact Pokes to Place Objects on Planar Surfaces","authors":"Ahmed Zermane;Léo Moussafir;Youcan Yan;Abderrahmane Kheddar","doi":"10.1109/LRA.2024.3491412","DOIUrl":"https://doi.org/10.1109/LRA.2024.3491412","url":null,"abstract":"We present a planning and control method that computes a minimal sequence of pokes to slide a given object from an initial pose to a desired final one (or as close to it as possible) on a planar surface. Both planning and control are based on impact models to generate pokes. Our framework takes into account the object's dynamics with a rich contact model and parameters to plan the poking sequence. The planning is conducted in the joint-space and generates trajectories tracked using an impact-aware QP control, which corrects for post-pokes errors using discrete visual feedback. We implemented our method on a Panda robot arm and assessed its versatility and robustness. The experimental results show that the proposed poking approach can bring the object to the desired position and orientation with minimal errors (0.05 m for translation and 0.2 rd for rotation), highlighting its potential application in diverse industrial scenarios such as logistics.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11393-11400"},"PeriodicalIF":4.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception","authors":"Alessio Caporali;Kevin Galassi;Gianluca Palli","doi":"10.1109/LRA.2024.3491428","DOIUrl":"https://doi.org/10.1109/LRA.2024.3491428","url":null,"abstract":"The perception of Deformable Linear Objects (DLOs) is a challenging task due to their complex and ambiguous appearance, lack of discernible features, typically small sizes, and deformability. Despite these challenges, achieving a robust and effective segmentation of DLOs is crucial to introduce robots into environments where they are currently underrepresented, such as domestic and complex industrial settings. In this context, the integration of language-based inputs can simplify the perception task while also enabling the possibility of introducing robots as human companions. Therefore, this letter proposes a novel architecture for the perception of DLOs, wherein the input image is augmented with a text-based prompt guiding the segmentation of the target DLO. After encoding the image and text separately, a Perceiver-inspired structure is exploited to compress the concatenated data into transformer layers and generate the output mask from a latent vector representation. The method is experimentally evaluated on real-world images of DLOs like electrical cables and ropes, validating its efficacy and efficiency in real practical scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11385-11392"},"PeriodicalIF":4.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10742556","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Nogueira Nobrega;Ewerton de Oliveira;Martin Saska;Tiago Nascimento
{"title":"Proximal Control of UAVs With Federated Learning for Human-Robot Collaborative Domains","authors":"Lucas Nogueira Nobrega;Ewerton de Oliveira;Martin Saska;Tiago Nascimento","doi":"10.1109/LRA.2024.3491417","DOIUrl":"https://doi.org/10.1109/LRA.2024.3491417","url":null,"abstract":"The human-robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot's movement, the human operator is often out of the robot's field of view. Furthermore, in multi-robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi-robot system's learning. Furthermore, our multi-robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11305-11312"},"PeriodicalIF":4.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RoadRunner M&M - Learning Multi-Range Multi-Resolution Traversability Maps for Autonomous Off-Road Navigation","authors":"Manthan Patel;Jonas Frey;Deegan Atha;Patrick Spieler;Marco Hutter;Shehryar Khattak","doi":"10.1109/LRA.2024.3490404","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490404","url":null,"abstract":"Autonomous robot navigation in off–road environments requires a comprehensive understanding of the terrain geometry and traversability. The degraded perceptual conditions and sparse geometric information at longer ranges make the problem challenging especially when driving at high speeds. Furthermore, the sensing–to–mapping latency and the look–ahead map range can limit the maximum speed of the vehicle. Building on top of the recent work RoadRunner, in this work, we address the challenge of long-range (\u0000<inline-formula><tex-math>$pm 100 ,text{m}$</tex-math></inline-formula>\u0000) traversability estimation. Our RoadRunner (M&M) is an end-to-end learning-based framework that directly predicts the traversability and elevation maps at multiple ranges (\u0000<inline-formula><tex-math>$pm 50 ,text{m}$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$pm 100 ,text{m}$</tex-math></inline-formula>\u0000) and resolutions (\u0000<inline-formula><tex-math>$0.2 ,text{m}$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$0.8 ,text{m}$</tex-math></inline-formula>\u0000) taking as input multiple images and a LiDAR voxel map. Our method is trained in a self–supervised manner by leveraging the dense supervision signal generated by fusing predictions from an existing traversability estimation stack (X-Racer) in hindsight and satellite Digital Elevation Maps. RoadRunner M&M achieves a significant improvement of up to 50% for elevation mapping and 30% for traversability estimation over RoadRunner, and is able to predict in 30% more regions compared to X-Racer while achieving real–time performance. Experiments on various out–of–distribution datasets also demonstrate that our data-driven approach starts to generalize to novel unstructured environments. We integrate our proposed framework in closed–loop with the path planner to demonstrate autonomous high–speed off–road robotic navigation in challenging real–world environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11425-11432"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bimanual Grasp Synthesis for Dexterous Robot Hands","authors":"Yanming Shao;Chenxi Xiao","doi":"10.1109/LRA.2024.3490393","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490393","url":null,"abstract":"Humans naturally perform bimanual skills to handle large and heavy objects. To enhance robots' object manipulation capabilities, generating effective bimanual grasp poses is essential. Nevertheless, bimanual grasp synthesis for dexterous hand manipulators remains underexplored. To bridge this gap, we propose the BimanGrasp algorithm for synthesizing bimanual grasps on 3D objects. The BimanGrasp algorithm generates grasp poses by optimizing an energy function that considers grasp stability and feasibility. Furthermore, the synthesized grasps are verified using the Isaac Gym physics simulation engine. These verified grasp poses form the BimanGrasp-Dataset, the first large-scale synthesized bimanual dexterous hand grasp pose dataset to our knowledge. The dataset comprises over 150k verified grasps on 900 objects, facilitating the synthesis of bimanual grasps through a data-driven approach. Last, we propose BimanGrasp-DDPM, a diffusion model trained on the BimanGrasp-Dataset. This model achieved a grasp synthesis success rate of 69.87% and significant acceleration in computational speed compared to BimanGrasp algorithm.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11377-11384"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}