{"title":"FFBGNet: Full-Flow Bidirectional Feature Fusion Grasp Detection Network Based on Hybrid Architecture","authors":"Qin Wan;Shunxing Ning;Haoran Tan;Yaonan Wang;Xiaogang Duan;Zhi Li;Yang Yang;Jianhua Qiu","doi":"10.1109/LRA.2024.3511410","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511410","url":null,"abstract":"Effectively integrating the complementary information from RGB-D images presents a significant challenge in robotic grasping. In this letter, we propose a full-flow bidirectional feature fusion grasp detection network (FFBGNet) based on a hybrid architecture to generate accurate grasp poses from RGB-D images. First, we construct an efficient Cross-Modal Feature fusion module as a bridge for information interaction in the full flow of the two branches, where fusion is applied to each encoding and decoding layer. Then, the two branches can fully leverage the appearance information in the RGB images and the geometry information from the depth images. Second, a hybrid architecture module for CNNs and Transformer parallel is developed to achieve better local feature and global information representations. Finally, we conduct qualitative and quantitative comparative experiments on the Cornell and Jacquard datasets, achieving grasping detection accuracies of 99.2\u0000<inline-formula><tex-math>${%}$</tex-math></inline-formula>\u0000 and 96.5\u0000<inline-formula><tex-math>${%}$</tex-math></inline-formula>\u0000, respectively. Simultaneously, in physical grasping experiments, the FFBGNet achieves a 96.7\u0000<inline-formula><tex-math>${%}$</tex-math></inline-formula>\u0000 success rate in cluttered scenes, which further demonstrates the reliability of the proposed method.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"971-978"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TALKER: A Task-Activated Language Model Based Knowledge-Extension Reasoning System","authors":"Jiabin Lou;Rongye Shi;Yuxin Lin;Qunbo Wang;Wenjun Wu","doi":"10.1109/LRA.2024.3511434","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511434","url":null,"abstract":"Training drones to execute complex collective tasks via multi-agent reinforcement learning presents significant challenges. To address these challenges, this letter introduces the Task-Activated Language model-based Knowledge-Extension Reasoning system. Specifically, we trained drones in two fine-grained skills and developed an action primitive library based on these capabilities, enabling a hierarchical approach to managing complex swarm operations. Leveraging this primitive library, we employ large language models to perform task planning, continuously refining the planning outcomes based on external user feedback. Successful task codes are temporarily stored within the action primitive library, with their utilization being authorized based on internal feedback from maintainers. We defined this process as knowledge expansion. In addition, more refined customized prompts are generated based on task descriptions and the action primitive documentation, a mechanism referred to as Task Activation. Our system synergistically integrates task activation and knowledge expansion mechanisms, enabling continuous evolution through human feedback to effectively manage extensive swarms in the execution of complex collective tasks. Experimental results demonstrate the superior performance of our system in various drone swarm tasks, including collaborative search, object tracking, cooperative interception, and aerial patrol.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"1026-1033"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph Brignone;Logan Lancaster;Edoardo Battaglia;Haohan Zhang
{"title":"Towards Shape-Adaptive Attachment Design for Wearable Devices Using Granular Jamming","authors":"Joseph Brignone;Logan Lancaster;Edoardo Battaglia;Haohan Zhang","doi":"10.1109/LRA.2024.3511417","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511417","url":null,"abstract":"Attaching a wearable device to the user's body for comfort and function while accommodating the differences and changes in body shapes often represents a challenge. In this letter, we propose an approach that addresses this problem through granular jamming, where a granule-filled membrane stiffens by rapidly decreasing the internal air pressure (e.g., vacuum), causing the granule material to be jammed together due to friction. This structure was used to conform to complex shapes of the human body when it is in the soft state while switching to the rigid state for proper robot functions by jamming the granules via vacuum. We performed an experiment to systematically investigate the effect of multiple design parameters on the ability of such jamming-based interfaces to hold against a lateral force. Specifically, we developed a bench prototype where modular granular-jamming structures are attached to objects of different sizes and shapes via a downward suspension force. Our data showed that the use of jamming is necessary to increase the overall structure stability by 1.73 to 2.16 N. Furthermore, using three modules, high suspension force, and a low membrane infill (\u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u000025%) also contribute to high resistance to lateral force. Our results lay a foundation for future implementation of wearable attachments using granular-jamming structures.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"476-483"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeqing Zhang;Guangze Zheng;Xuebo Ji;Guanqi Chen;Ruixing Jia;Wentao Chen;Guanhua Chen;Liangjun Zhang;Jia Pan
{"title":"Understanding Particles From Video: Property Estimation of Granular Materials via Visuo-Haptic Learning","authors":"Zeqing Zhang;Guangze Zheng;Xuebo Ji;Guanqi Chen;Ruixing Jia;Wentao Chen;Guanhua Chen;Liangjun Zhang;Jia Pan","doi":"10.1109/LRA.2024.3511380","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511380","url":null,"abstract":"Granular materials (GMs) are ubiquitous in daily life. Understanding their properties is also important, especially in agriculture and industry. However, existing works require dedicated measurement equipment and also need large human efforts to handle a large number of particles. In this paper, we introduce a method for estimating the relative values of particle size and density from the video of the interaction with GMs. It is trained on a visuo-haptic learning framework inspired by a contact model, which reveals the strong correlation between GM properties and the visual-haptic data during the probe-dragging in the GMs. After training, the network can map the visual modality well to the haptic signal and implicitly characterize the relative distribution of particle properties in its latent embeddings, as interpreted in that contact model. Therefore, we can analyze GM properties using the trained encoder, and only visual information is needed without extra sensory modalities and human efforts for labeling. The presented GM property estimator has been extensively validated via comparison and ablation experiments. The generalization capability has also been evaluated and a real-world application on the beach is also demonstrated. Experiment videos are available at \u0000<uri>https://sites.google.com/view/gmwork/vhlearning</uri>\u0000.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"684-691"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RecGS: Removing Water Caustic With Recurrent Gaussian Splatting","authors":"Tianyi Zhang;Weiming Zhi;Braden Meyers;Nelson Durrant;Kaining Huang;Joshua Mangelson;Corina Barbalata;Matthew Johnson-Roberson","doi":"10.1109/LRA.2024.3511418","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511418","url":null,"abstract":"Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this letter, we present a novel method \u0000<italic>Recurrent Gaussian Splatting</i>\u0000 (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3D Gaussian Splatting (3DGS), to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our proposed RecGS paradigm can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"668-675"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fotis Panetsos;George C. Karras;Kostas J. Kyriakopoulos
{"title":"GP-Based NMPC for Aerial Transportation of Suspended Loads","authors":"Fotis Panetsos;George C. Karras;Kostas J. Kyriakopoulos","doi":"10.1109/LRA.2024.3511436","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511436","url":null,"abstract":"In this work, we leverage Gaussian Processes (GPs) and present a learning-based control scheme for the transportation of cable-suspended loads with multirotors. Our ultimate goal is to approximate the model discrepancies that exist between the actual and nominal system dynamics. Towards this direction, weighted and sparse Gaussian Process (GP) regression is exploited so as to approximate online the model errors and guarantee real-time performance while also ensuring adaptability to the conditions prevailing in the outdoor environment where the multirotor is deployed. The learned model errors are fed into a nonlinear Model Predictive Controller (NMPC), formulated for the corrected system dynamics, which achieves the transportation of the multirotor towards reference positions with simultaneous minimization of the cable angular motion, regardless of the outdoor conditions and the existence of external disturbances, primarily stemming from the unknown wind. The proposed scheme is validated through simulations and real-world experiments with an octorotor, demonstrating an 80% reduction in the steady-state position error under 4 Beaufort wind conditions compared to the nominal NMPC.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"524-531"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Measurement Model-Based Fusion of Capacitive Proximity Sensor and LiDAR for Improved Mobile Robot Perception","authors":"Hyunchang Kang;Hongsik Yim;Hyukjae Sung;Hyouk Ryeol Choi","doi":"10.1109/LRA.2024.3511432","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511432","url":null,"abstract":"This study introduces a novel algorithm that combines a custom-developed capacitive proximity sensor with LiDAR. This integration targets the limitations of using single-sensor systems for mobile robot perception. Our approach deals with the non-Gaussian distribution that arises during the nonlinear transformation of capacitive sensor data into distance measurements. The non-Gaussian distribution resulting from this nonlinear transformation is linearized using a first-order Taylor approximation, creating a measurement model unique to our sensor. This method helps establish a linear relationship between capacitance values and their corresponding distance measurements. Assuming that the capacitance's standard deviation (\u0000<inline-formula><tex-math>$sigma$</tex-math></inline-formula>\u0000) remains constant, it is modeled as a distance function. By linearizing the capacitance data and synthesizing it with LiDAR data using Gaussian methods, we fuse the sensor information to enhance integration. This results in more precise and robust distance measurements than those obtained through traditional Extended Kalman Filter (EKF) and Adaptive Extended Kalman Filter (AEKF) methods. The proposed algorithm is designed for real-time data processing, significantly improving the robot's state estimation accuracy and stability in various environments. This study offers a reliable method for positional estimation of mobile robots, showcasing outstanding fusion performance in complex settings.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"836-843"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMIF-VIO: A Novel Cross Modal Interaction Framework for Visual Inertial Odometry","authors":"Zhenyu Wang;Yunzhou Zhang;Xiao Xu;Mengchen Xiong;Xin Su;Fanle Meng","doi":"10.1109/LRA.2024.3511374","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511374","url":null,"abstract":"Visual Inertial Odometry (VIO) estimates predicted trajectories through self motion. With the popularization of artificial intelligence, deep learning-based VIO methods have shown better performance than traditional geometry-based VIO methods. However, in deep learning methods, how to better achieve the fusion and complementarity between visual images from cameras and Inertial Measurement Unit (IMU) measurements from IMU sensors to output accurate pose remains a challenge. In this letter, we propose a novel Cross Modal Interaction Framework for VIO, named CMIF-VIO, which improves the accuracy of VIO and has good real-time performance. Specifically, we first used existing backbone network and built a simple backbone network to extract features from camera and IMU separately, ensuring low complexity. Then, we explored a cross modal interaction module that adaptively integrates information from different modal features, achieving deep interaction between visual and IMU modal features while maintaining feature dominance in each modal branch. Finally, a Long Short Term Memory (LSTM) network was introduced to model temporal motion correlation and output high-precision 6-degree-of-freedom (6-DOF) poses. The experimental results show that our method exhibits better performance compared to state-of-the-art VIO methods, and its real-time performance can meet the needs of practical application scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"875-882"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aastha Acharya;Caleb Lee;Marissa D'Alonzo;Jared Shamwell;Nisar R. Ahmed;Rebecca Russell
{"title":"Deep Modeling of Non-Gaussian Aleatoric Uncertainty","authors":"Aastha Acharya;Caleb Lee;Marissa D'Alonzo;Jared Shamwell;Nisar R. Ahmed;Rebecca Russell","doi":"10.1109/LRA.2024.3511376","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511376","url":null,"abstract":"Deep learning offers promising new ways to accurately model aleatoric uncertainty in robotic state estimation systems, particularly when the uncertainty distributions do not conform to traditional assumptions of being fixed and Gaussian. In this study, we formulate and evaluate three fundamental deep learning approaches for conditional probability density modeling to quantify non-Gaussian aleatoric uncertainty: parametric, discretized, and generative modeling. We systematically compare the respective strengths and weaknesses of these three methods on simulated non-Gaussian densities as well as on real-world terrain-relative navigation data. Our results show that these deep learning methods can accurately capture complex uncertainty patterns, highlighting their potential for improving the reliability and robustness of estimation systems.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"660-667"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Arpa Perozo;Ethan Niddam;Sylvain Durand;Loïc Cuvillon;Jacques Gangloff
{"title":"Teleoperation of a Suspended Aerial Manipulator Using a Handheld Camera With an IMU","authors":"Miguel Arpa Perozo;Ethan Niddam;Sylvain Durand;Loïc Cuvillon;Jacques Gangloff","doi":"10.1109/LRA.2024.3511373","DOIUrl":"https://doi.org/10.1109/LRA.2024.3511373","url":null,"abstract":"This letter presents a simple, low-cost teleoperation system. The leader device is a handheld camera integrated with an Inertial Measurement Unit (IMU), making it feasible to use a modern smartphone for this purpose. Existing leader devices require hardware and sensors to measure both the user interactions, and to control the follower device. By contrast, the proposed method uses the handheld camera both as a leader device and as a sensor to control the position of the follower device through visual servoing. To the best of the author's knowledge, this visual servoing scenario where the camera is held by a user has not been thoroughly studied. The measurements from the handheld device and the follower are fused together in an Extended Kalman Filter (EKF) to improve further the pose estimation. A Virtual Camera and IMU (VCI) concept is introduced to filter hand tremors for teleoperation efficiency without hindering the bandwidth of the relative pose control loop. The EKF and the VCI performance are assessed experimentally by teleoperating a Suspended Aerial Manipulator (AMES) prototype.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 1","pages":"700-707"},"PeriodicalIF":4.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}