Dongkun Zhang;Jiaming Liang;Sha Lu;Ke Guo;Qi Wang;Rong Xiong;Zhenwei Miao;Yue Wang
{"title":"PEP: Policy-Embedded Trajectory Planning for Autonomous Driving","authors":"Dongkun Zhang;Jiaming Liang;Sha Lu;Ke Guo;Qi Wang;Rong Xiong;Zhenwei Miao;Yue Wang","doi":"10.1109/LRA.2024.3490377","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490377","url":null,"abstract":"Autonomous driving demands proficient trajectory planning to ensure safety and comfort. This letter introduces Policy-Embedded Planner (PEP), a novel framework that enhances closed-loop performance of imitation learning (IL) based planners by embedding a neural policy for sequential ego pose generation, leveraging predicted trajectories of traffic agents. PEP addresses the challenges of distribution shift and causal confusion by decomposing multi-step planning into single-step policy rollouts, applying a coordinate transformation technique to simplify training. PEP allows for the parallel generation of multi-modal candidate trajectories and incorporates both neural and rule-based scoring functions for trajectory selection. To mitigate the negative effects of prediction error on closed-loop performance, we propose an information-mixing mechanism that alternates the utilization of traffic agents' predicted and ground-truth information during training. Experimental validations on nuPlan benchmark highlight PEP's superiority over IL- and rule-based state-of-the-art methods.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11361-11368"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TICMapNet: A Tightly Coupled Temporal Fusion Pipeline for Vectorized HD Map Learning","authors":"Wenzhao Qiu;Shanmin Pang;Hao Zhang;Jianwu Fang;Jianru Xue","doi":"10.1109/LRA.2024.3490384","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490384","url":null,"abstract":"High-Definition (HD) map construction is essential for autonomous driving to accurately understand the surrounding environment. Most existing methods rely on single-frame inputs to predict local map, which often fail to effectively capture the temporal correlations between frames. This limitation results in discontinuities and instability in the generated map.To tackle this limitation, we propose a \u0000<italic>Ti</i>\u0000ghtly \u0000<italic>C</i>\u0000oupled temporal fusion \u0000<italic>Map</i>\u0000 \u0000<italic>Net</i>\u0000work (TICMapNet). TICMapNet breaks down the fusion process into three sub-problems: PV feature alignment, BEV feature adjustment, and Query feature fusion. By doing so, we effectively integrate temporal information at different stages through three plug-and-play modules, using the proposed tightly coupled strategy. Unlike traditional methods, our approach does not rely on camera extrinsic parameters, offering a new perspective for addressing the visual fusion challenge in the field of object detection. Experimental results show that TICMapNet significantly improves upon its single-frame baseline model, achieving at least a 7.0% increase in mAP using just two consecutive frames on the nuScenes dataset, while also showing generalizability across other tasks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11289-11296"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dapeng Feng;Yuhua Qi;Shipeng Zhong;Zhiqiang Chen;Qiming Chen;Hongbo Chen;Jin Wu;Jun Ma
{"title":"S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM","authors":"Dapeng Feng;Yuhua Qi;Shipeng Zhong;Zhiqiang Chen;Qiming Chen;Hongbo Chen;Jin Wu;Jun Ma","doi":"10.1109/LRA.2024.3490402","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490402","url":null,"abstract":"The burgeoning demand for collaborative robotic systems to execute complex tasks collectively has intensified the research community's focus on advancing simultaneous localization and mapping (SLAM) in a cooperative context. Despite this interest, the scalability and diversity of existing datasets for collaborative trajectories remain limited, especially in scenarios with constrained perspectives where the generalization capabilities of Collaborative SLAM (C-SLAM) are critical for the feasibility of multi-agent missions. Addressing this gap, we introduce S3E, an expansive multimodal dataset. Captured by a fleet of unmanned ground vehicles traversing four distinct collaborative trajectory paradigms, S3E encompasses 13 outdoor and 5 indoor sequences. These sequences feature meticulously synchronized and spatially calibrated data streams, including 360-degree LiDAR point cloud, high-resolution stereo imagery, high-frequency inertial measurement units (IMU), and Ultra-wideband (UWB) relative observations. Our dataset not only surpasses previous efforts in scale, scene diversity, and data intricacy but also provides a thorough analysis and benchmarks for both collaborative and individual SLAM methodologies.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11401-11408"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physics-Guided Deep Learning Enabled Surrogate Modeling for Pneumatic Soft Robots","authors":"Sameh I. Beaber;Zhen Liu;Ye Sun","doi":"10.1109/LRA.2024.3490258","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490258","url":null,"abstract":"Soft robots, formulated by soft and compliant materials, have grown significantly in recent years toward safe and adaptable operations and interactions with dynamic environments. Modeling the complex, nonlinear behaviors and controlling the deformable structures of soft robots present challenges. This study aims to establish a physics-guided deep learning (PGDL) computational framework that integrates physical models into deep learning framework as surrogate models for soft robots. Once trained, these models can replace computationally expensive numerical simulations to shorten the computation time and enable real-time control. This PGDL framework is among the first to integrate first principle physics of soft robots into deep learning toward highly accurate yet computationally affordable models for soft robot modeling and control. The proposed framework has been implemented and validated using three different pneumatic soft fingers with different behaviors and geometries, along with two training and testing approaches, to demonstrate its effectiveness and generalizability. The results showed that the mean square error (MSE) of predicted deformed curvature and the maximum and minimum deformation at various loading conditions were as low as \u0000<inline-formula><tex-math>$10^{-4}$</tex-math></inline-formula>\u0000 mm\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000. The proposed PGDL framework is constructed from first principle physics and intrinsically can be applicable to various conditions by carefully considering the governing equations, auxiliary equations, and the corresponding boundary and initial conditions.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11441-11448"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferring Occluded Agent Behavior in Dynamic Games From Noise Corrupted Observations","authors":"Tianyu Qiu;David Fridovich-Keil","doi":"10.1109/LRA.2024.3490398","DOIUrl":"https://doi.org/10.1109/LRA.2024.3490398","url":null,"abstract":"In mobile robotics and autonomous driving, it is natural to model agent interactions as the Nash equilibrium of a noncooperative, dynamic game. These methods inherently rely on observations from sensors such as lidars and cameras to identify agents participating in the game and, therefore, have difficulty when some agents are occluded. To address this limitation, this paper presents an occlusion-aware game-theoretic inference method to estimate the locations of potentially occluded agents, and simultaneously infer the intentions of both visible and occluded agents, which best accounts for the observations of visible agents. Additionally, we propose a receding horizon planning strategy based on an occlusion-aware contingency game designed to navigate in scenarios with potentially occluded agents. Monte Carlo simulations validate our approach, demonstrating that it accurately estimates the game model and trajectories for both visible and occluded agents using noisy observations of visible agents. Our planning pipeline significantly enhances navigation safety when compared to occlusion-ignorant baseline as well.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11489-11496"},"PeriodicalIF":4.6,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Variational DeepMDP: An Efficient Approach for Industrial Assembly in High-Mix, Low-Volume Production","authors":"Grzegorz Bartyzel","doi":"10.1109/LRA.2024.3487490","DOIUrl":"https://doi.org/10.1109/LRA.2024.3487490","url":null,"abstract":"Transferability, along with sample efficiency, is a critical factor for a reinforcement learning (RL) agent's successful application in real-world contact-rich manipulation tasks, such as product assembly. For instance, in the case of the industrial insertion task on high-mix, low-volume (HMLV) production lines, transferability could eliminate the need for machine retooling, thus reducing production line downtimes. In our work, we introduce a method called Multimodal Variational DeepMDP (MVDeepMDP) that demonstrates the ability to generalize to various environmental variations not encountered during training. The key feature of our approach involves learning a multimodal latent dynamic representation. We demonstrate the effectiveness of our method in the context of an electronic parts insertion task, which is challenging for RL agents due to the diverse physical properties of the non-standardized components, as well as simple 3D printed blocks insertion. Furthermore, we evaluate the transferability of MVDeepMDP and analyze the impact of the balancing mechanism of the \u0000<italic>generalized Product-of-Experts</i>\u0000 (gPoE), which is used to combine observable modalities. Finally, we explore the influence of separately processing state modalities of different physical quantities, such as pose and 6D force/torque (F/T) data.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11297-11304"},"PeriodicalIF":4.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bach-Thuan Bui;Huy-Hoang Bui;Dinh-Tuan Tran;Joo-Ho Lee
{"title":"D2S: Representing Sparse Descriptors and 3D Coordinates for Camera Relocalization","authors":"Bach-Thuan Bui;Huy-Hoang Bui;Dinh-Tuan Tran;Joo-Ho Lee","doi":"10.1109/LRA.2024.3487503","DOIUrl":"https://doi.org/10.1109/LRA.2024.3487503","url":null,"abstract":"State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the previous regression-based methods in both indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11449-11456"},"PeriodicalIF":4.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Ole von Hartz;Tim Welschehold;Abhinav Valada;Joschka Boedecker
{"title":"The Art of Imitation: Learning Long-Horizon Manipulation Tasks From Few Demonstrations","authors":"Jan Ole von Hartz;Tim Welschehold;Abhinav Valada;Joschka Boedecker","doi":"10.1109/LRA.2024.3487506","DOIUrl":"https://doi.org/10.1109/LRA.2024.3487506","url":null,"abstract":"Task Parametrized Gaussian Mixture Models (TP-GMM) are a sample-efficient method for learning object-centric robot manipulation tasks. However, there are several open challenges to applying TP-GMMs in the wild. In this work, we tackle three crucial challenges synergistically. First, end-effector velocities are non-Euclidean and thus hard to model using standard GMMs. We thus propose to factorize the robot's end-effector velocity into its direction and magnitude, and model them using Riemannian GMMs. Second, we leverage the factorized velocities to segment and sequence skills from complex demonstration trajectories. Through the segmentation, we further align skill trajectories and hence leverage time as a powerful inductive bias. Third, we present a method to automatically detect relevant task parameters \u0000<italic>per</i>\u0000 skill from visual observations. Our approach enables learning complex manipulation tasks from just five demonstrations while using only RGB-D observations. Extensive experimental evaluations on RLBench demonstrate that our approach achieves state-of-the-art performance with 20-fold improved sample efficiency. Our policies generalize across different environments, object instances, and object positions, while the learned skills are reusable.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11369-11376"},"PeriodicalIF":4.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open-Structure: Structural Benchmark Dataset for SLAM Algorithms","authors":"Yanyan Li;Zhao Guo;Ze Yang;Yanbiao Sun;Liang Zhao;Federico Tombari","doi":"10.1109/LRA.2024.3487071","DOIUrl":"https://doi.org/10.1109/LRA.2024.3487071","url":null,"abstract":"This letter presents Open-Structure, a novel benchmark dataset for evaluating visual odometry and SLAM methods. Compared to existing public datasets that primarily offer raw images, Open-Structure provides direct access to point and line measurements, correspondences, structural associations, and co-visibility factor graphs, which can be fed to various stages of SLAM pipelines to mitigate the impact of data preprocessing modules in ablation experiments. The dataset comprises two distinct types of sequences from the perspective of scenarios. The first type maintains reasonable observation and occlusion relationships, as these critical elements are extracted from public image-based sequences using our dataset generator. In contrast, the second type consists of carefully designed simulation sequences that enhance dataset diversity by introducing a wide range of trajectories and observations. Furthermore, a baseline is proposed using our dataset to evaluate widely used modules, including camera pose tracking, parametrization, and factor graph optimization, within SLAM systems. By evaluating these state-of-the-art algorithms across different scenarios, we discern each module's strengths and weaknesses in the context of camera tracking and optimization processes.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11457-11464"},"PeriodicalIF":4.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biomimetic Robotic Remora With Hitchhiking Ability: Design, Control and Experiment","authors":"Tong Tan;Lin Yu;Kai Guo;Xuyang Wang;Lei Qiao","doi":"10.1109/LRA.2024.3487075","DOIUrl":"https://doi.org/10.1109/LRA.2024.3487075","url":null,"abstract":"Remora, which is well known for it's ’hitchhiking’ behavior, can attach to diverse marine animals and travel with them for a long distance with low energy consumption due to its special disc. In this letter, inspired by the unique ’hitchhiking’ behavior, a new prototype of a robotic remora with good maneuverability, reliable adhesion system and robust motion control strategy is designed to apply the ’hitchhiking’ behavior to engineered system. In the design of the mechanics, a robotic remora with wire-driven propulsion mechanism and pectoral fins diving mechanism is developed to realize the decoupled planar and vertical motion. Besides, a stable adhesion system with considerable adhesive force(\u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u0000274 N), low pre-load demand and perception ability is developed. In the aspect of motion control, considering the underactuated characteristics, highly nonlinear and model uncertainty of the robotic remora dynamics, a planar controller consisted of a line-of-sight guidance law and an active disturbance rejection heading controller, and a proportional-integral-derivative based depth controller are adopted. The combination of the two controllers enable the robotic fish to achieve autonomous motion in three-dimensional space, providing strong support for achieving the ’hitchhiking’ behavior. Extensive experiments, including the coordination between the robotic remora and underwater vehicle, have been conducted to verify the reliability of the designed robotic remora and the corresponding control strategy.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11505-11512"},"PeriodicalIF":4.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}