{"title":"DRIM: Depth Restoration With Interference Mitigation in Multiple LiDAR Depth Cameras","authors":"Seunghui Shin;Jaeyun Jang;Sundong Park;Hyoseok Hwang","doi":"10.1109/LRA.2025.3619771","DOIUrl":"https://doi.org/10.1109/LRA.2025.3619771","url":null,"abstract":"LiDAR depth cameras are widely used for accurate depth measurement in various applications. However, when multiple cameras operate simultaneously, mutual interference causes artifacts in the captured depth data, which existing image restoration methods struggle to handle. In this letter, we propose DRIM, a novel approach for real-time depth restoration under multi-device interference. Our method begins by distinguishing interference-induced artifacts, then predicts and leverages these artifacts to guide the restoration process. Since there is no existing dataset for learning interference in multiple LiDAR depth cameras, we create and provide the first depth interference dataset. Our experiments demonstrate superior depth restoration performance compared to other image restoration methods, achieving real-time processing speeds (<inline-formula><tex-math>$approx$</tex-math></inline-formula>33 FPS) that are significantly faster than existing approaches while showing the capability to restore depth in challenging scenarios. These results demonstrate that our proposed method effectively restores interfered depth in multiple LiDAR depth cameras with practical real-time performance.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"12079-12086"},"PeriodicalIF":5.3,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Zhuang;Yiqing Yao;Taihong Yang;Zijian Wang;Tao Zhang
{"title":"Boosting FMCW Radar Heat Map Object Detection With Raw ADC Data","authors":"Long Zhuang;Yiqing Yao;Taihong Yang;Zijian Wang;Tao Zhang","doi":"10.1109/LRA.2025.3617727","DOIUrl":"https://doi.org/10.1109/LRA.2025.3617727","url":null,"abstract":"Millimeter-wave (mmWave) radar is crucial for environmental perception in autonomous driving, especially under complex conditions. While radar heatmaps provide richer information than point clouds, extracting semantic details from heatmaps alone remains challenging. To address this, we propose leveraging raw radar Analog-to-Digital Converter (ADC) data and introduce Mamba-RODNet, a novel network that integrates radar heatmaps with ADC data. For long-sequence modeling such as ADC, Mamba outperforms Transformers in both accuracy and efficiency, making it well suited for autonomous driving perception. We further design an ADC-Mamba (AM) module that fuses multi-scale features from ADC and heatmaps, enhancing detection performance. Experiments on the large-scale RADDet dataset show that our method achieves state-of-the-art results in both average precision (AP) and floating point operations per second (FLOPs). Ablation studies demonstrate that incorporating ADC data improves mean Average Precision (mAP) by 7%. In summary, this work establishes a new paradigm for integrating raw mmWave radar ADC data into object detection, with significant implications for the field. Our code is available at here.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"12087-12094"},"PeriodicalIF":5.3,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seeing Beyond Local Events: Recurrent Optical Flow Estimation With Hierarchical Motion Aggregation","authors":"Daikun Liu;Teng Wang;Changyin Sun","doi":"10.1109/LRA.2025.3617737","DOIUrl":"https://doi.org/10.1109/LRA.2025.3617737","url":null,"abstract":"Current event-based optical flow estimation methods typically utilize at most two event streams as input, overlooking the role of temporal coherence present in continuous event streams for the current motion estimation. Moreover, existing simple motion propagation strategies are insufficient for propagating historical motion information effectively. To this end, we propose TREFlow, a recurrent event-based optical flow estimation framework with hierarchical motion aggregation. Our method aggregates rich motion features in a short-to-long-term manner. We introduce a Short-Term Motion Encoding (STME) module and a Long-Term Memory Aggregation (LTMA) module to capture dense motion features within the current temporal window and comprehensively incorporate historical motion prior knowledge, respectively, thereby enhancing and compensating the current motion representation. Our method outperforms other methods in optical flow inference on MVSEC and DSEC-Flow.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11721-11728"},"PeriodicalIF":5.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomas Berriel Martins;Martin R. Oswald;Javier Civera
{"title":"Open-Vocabulary Online Semantic Mapping for SLAM","authors":"Tomas Berriel Martins;Martin R. Oswald;Javier Civera","doi":"10.1109/LRA.2025.3617736","DOIUrl":"https://doi.org/10.1109/LRA.2025.3617736","url":null,"abstract":"This letter presents an <underline>O</u>pen-<underline>V</u>ocabulary <underline>O</u>nline 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11745-11752"},"PeriodicalIF":5.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11192614","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation","authors":"Xiang Li;Yupeng Zheng;Pengfei Li;Yilun Chen;Ya-Qin Zhang;Wenchao Ding","doi":"10.1109/LRA.2025.3615532","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615532","url":null,"abstract":"Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty voxels, while sparse query-based approaches lack robustness in diverse and complex indoor scenes. In this letter, we propose DiScene, a novel sparse query-based framework that leverages multi-level distillation to achieve efficient and robust occupancy prediction. In particular, our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, which transfers hierarchical representations from large teacher models to lightweight students through coordinated alignment across four levels, including encoder-level feature alignment, query-level feature matching, prior-level spatial guidance, and anchor-level high-confidence knowledge transfer and (2) a Teacher-Guided Initialization policy, employing optimized parameter warm-up to accelerate model convergence. Validated on the Occ-Scannet benchmark, DiScene achieves 23.2 FPS without depth priors while outperforming our baseline method, OPUS, by 36.1% and even better than the depth-enhanced version, OPUS<inline-formula><tex-math>$dagger$</tex-math></inline-formula>. With depth integration, DiScene<inline-formula><tex-math>$dagger$</tex-math></inline-formula> attains new SOTA performance, surpassing EmbodiedOcc by 3.7% with 1.62× faster inference speed. Furthermore, experiments on the Occ3D-nuScenes benchmark and in-the-wild scenarios demonstrate the versatility of our approach in various environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11690-11697"},"PeriodicalIF":5.3,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Achanccaray;Javier Andreu-Perez;Hidenobu Sumioka
{"title":"Neural Profiling With fNIRS of Operator Performance in Teleoperated Human-Like Social Robot Interactions","authors":"David Achanccaray;Javier Andreu-Perez;Hidenobu Sumioka","doi":"10.1109/LRA.2025.3615526","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615526","url":null,"abstract":"Social robot teleoperation is a skill that must be acquired through practice with the social robot. Mobile neuroimaging and human-computer interface performance metrics permit the gathering of information from the operators’ systemic and behavioral responses associated with their skill acquisition. Profiling the skill levels of social robot operators using this information can help improve training protocols. In this study, thirty-two participants performed real-world social robot teleoperation tasks. Brain function signals from the prefrontal cortex (PFC), and behavioral data from interactions with the system were collected using functional near-infrared spectroscopy (fNIRS). Participants were divided into two groups (high and low performance) based on an integrative metric of task efficiency, workload, and presence when operating the social robot. Significant differences were found in the operation time, width, and multiscale entropy of the hemoglobin oxygenation curve of the operator’s PFC. Functional connectivity in the PFC also depicted differences in the low- and high-performance groups when connectivity networks were compared and in the leaf fraction metrics of the functional networks. These findings contribute to understanding the operator’s progress during teleoperation training protocols and designing the interface to assist in enhancing task performance.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"12095-12102"},"PeriodicalIF":5.3,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11184209","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People","authors":"Masaki Kuribayashi;Kohei Uehara;Allan Wang;Daisuke Sato;Renato Alexandre Ribeiro;Simon Chu;Shigeo Morishima","doi":"10.1109/LRA.2025.3615028","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615028","url":null,"abstract":"Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11658-11665"},"PeriodicalIF":5.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui
{"title":"Real-Time Human–Drone Interaction via Active Multimodal Gesture Recognition Under Limited Field of View in Indoor Environments","authors":"Weicheng Fang;Ganghua Lai;Yushu Yu;Chuanbeibei Shi;Xin Meng;Jiali Sun;Zhenchao Cui","doi":"10.1109/LRA.2025.3615031","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615031","url":null,"abstract":"Gesture recognition, an important method for Human-Drone Interaction (HDI), is often constrained by sensor limitations, such as sensitivity to lighting variations and field of view (FoV) restrictions. This letter proposes a real-time drone control system that integrates multimodal fusion gesture recognition with active perception control to overcome these challenges. We constructed a diversified arm gesture dataset and designed a lightweight, one-stage point cloud and image features fusion model, Adaptive Gate Fusion Network (AGFNet), for real-time inference on embedded devices. We applied motion compensation to mitigate delay errors caused by point clouds accumulation and network inference during movement, and fused the drone’s velocity data and detection results using the Extended Kalman Filter (EKF) to enhance real-time performance. This enabled active perception through the optimization of the sensor’s FoV using perception-aware Model Predictive Control (PAMPC). Experimental results demonstrate that the proposed model achieves a threefold improvement in inference speed compared to the baseline, reaching 97.93% mAP on the test set and outperforming single-sensor networks by approximately 5%. Real-world testing further confirms the system’s applicability and effectiveness in indoor environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11705-11712"},"PeriodicalIF":5.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity","authors":"Sharanya Venkatesh;Bibit Bianchini;Alp Aydinoglu;William Yang;Michael Posa","doi":"10.1109/LRA.2025.3615030","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615030","url":null,"abstract":"To achieve general-purpose dexterous manipulation, robots must rapidly devise and execute contact-rich behaviors. Existing model-based controllers cannot globally optimize in real time over the exponential number of possible contact sequences. Instead, progress in contact-implicit control leverages simpler models that, while still hybrid, make local approximations. Locality limits the controller to exploit only nearby interactions, requiring intervention to richly explore contacts more broadly. Our approach leverages the strengths of local complementarity-based control combined with low-dimensional, but global, sampling of possible end effector locations. Our key insight is to consider a contact-free stage preceding a contact-rich stage at every control loop. Our algorithm, in parallel, samples end effector locations to which the contact-free stage can move the robot, then considers the cost predicted by contact-rich MPC local to each sampled location. The result is a globally-informed, contact-implicit controller capable of real-time dexterous manipulation. We demonstrate our controller on precise, non-prehensile manipulation of non-convex objects with a Franka arm.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"12117-12124"},"PeriodicalIF":5.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Detection of Objects Near a Robot Manipulator via Miniature Time-of-Flight Sensors","authors":"Carter Sifferman;Mohit Gupta;Michael Gleicher","doi":"10.1109/LRA.2025.3615037","DOIUrl":"https://doi.org/10.1109/LRA.2025.3615037","url":null,"abstract":"We provide a method for detecting and localizing objects near a robot arm using arm-mounted miniature time-of-flight sensors. A key challenge when using arm-mounted sensors is differentiating between the robot itself and external objects in sensor measurements. To address this challenge, we propose a computationally lightweight method which utilizes the raw time-of-flight information captured by many off-the-shelf, low-resolution time-of-flight sensor. We build an empirical model of expected sensor measurements in the presence of the robot alone, and use this model at runtime to detect objects in proximity to the robot. In addition to avoiding robot self-detections in common sensor configurations, the proposed method enables extra flexibility in sensor placement, unlocking configurations which achieve more efficient coverage of a radius around the robot arm. Our method can detect small objects near the arm and localize the position of objects along the length of a robot link to reasonable precision. We evaluate the performance of the method with respect to object type, location, and ambient light level, and identify limiting factors on performance inherent in the measurement principle. The proposed method has potential applications in collision avoidance and in facilitating safe human-robot interaction.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11682-11689"},"PeriodicalIF":5.3,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}