{"title":"Diffusion models for robotic manipulation: a survey.","authors":"Rosa Wolf, Yitian Shi, Sheng Liu, Rania Rayyes","doi":"10.3389/frobt.2025.1606247","DOIUrl":"10.3389/frobt.2025.1606247","url":null,"abstract":"<p><p>Diffusion generative models have demonstrated remarkable success in visual domains such as image and video generation. They have also recently emerged as a promising approach in robotics, especially in robot manipulations. Diffusion models leverage a probabilistic framework, and they stand out with their ability to model multi-modal distributions and their robustness to high-dimensional input and output spaces. This survey provides a comprehensive review of state-of-the-art diffusion models in robotic manipulation, including grasp learning, trajectory planning, and data augmentation. Diffusion models for scene and image augmentation lie at the intersection of robotics and computer vision for vision-based tasks to enhance generalizability and data scarcity. This paper also presents the two main frameworks of diffusion models and their integration with imitation learning and reinforcement learning. In addition, it discusses the common architectures and benchmarks and points out the challenges and advantages of current state-of-the-art diffusion-based methods.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1606247"},"PeriodicalIF":3.0,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12454101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145139173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javad Tayebi, Ti Chen, Xiaofeng Wu, Anand Kumar Mishra
{"title":"Editorial: Advancements in vibration control for space manipulators: actuators, algorithms, and material innovations.","authors":"Javad Tayebi, Ti Chen, Xiaofeng Wu, Anand Kumar Mishra","doi":"10.3389/frobt.2025.1681168","DOIUrl":"10.3389/frobt.2025.1681168","url":null,"abstract":"","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1681168"},"PeriodicalIF":3.0,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paloma de la Puente, Germán Vega-Martínez, Patricia Javierre, Javier Laserna, Elena Martin-Arias
{"title":"Combining vision and range sensors for AMCL localization in corridor environments with rectangular signs.","authors":"Paloma de la Puente, Germán Vega-Martínez, Patricia Javierre, Javier Laserna, Elena Martin-Arias","doi":"10.3389/frobt.2025.1652251","DOIUrl":"10.3389/frobt.2025.1652251","url":null,"abstract":"<p><p>Localization is widely recognized as a fundamental problem in mobile robotics. Even though robust localization methods do exist for many applications, it is difficult for them to succeed in complex environments and challenging situations. In particular, corridor-like environments present important issues for traditional range-based methods. The main contribution of this paper is the integration of new observation models into the popular AMCL ROS node, considering visual features obtained from the detection of rectangular landmarks. Visual rectangles are distinctive elements which are very common in man-made environments and should be detected and recognized in a robust manner. This hybrid approach is developed and evaluated both for the combination of an omnidirectional camera and a laser sensor (using artificial markers) and for RGB-D sensors (using natural rectangular features). For the latter, this work also introduces RIDGE, a novel algorithm for detecting projected quadrilaterals representing rectangles in images. Simulations and real world experiments are presented for both cases. As shown and discussed in the article, the proposed approach provides significant advantages for specific conditions and common scenarios such as long straight corridors.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1652251"},"PeriodicalIF":3.0,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12447077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fuze Sun, Lingyu Li, Shixiangyue Meng, Xiaoming Teng, Terry R Payne, Paul Craig
{"title":"Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting.","authors":"Fuze Sun, Lingyu Li, Shixiangyue Meng, Xiaoming Teng, Terry R Payne, Paul Craig","doi":"10.3389/frobt.2025.1635419","DOIUrl":"10.3389/frobt.2025.1635419","url":null,"abstract":"<p><p>This study investigates the integration of individual human traits into an empathetically adaptive educational robot tutor system designed to improve student engagement and learning outcomes with corresponding Engagement Vector measurements. While prior research in the field of Human-Robot Interaction (HRI) has examined the integration of the traits, such as emotional intelligence, memory-driven personalization, and non-verbal communication, by themselves, they have thus-far neglected to consider their synchronized integration into a cohesive, operational education framework. To address this gap, we customize a Multi-Modal Large Language Model (Llama 3.2 from Meta) deployed with modules for human-like traits (emotion, memory and gestures) into an AI-Agent framework. This constitutes the robot's intelligent core that mimics the human emotional system, memory architecture and gesture controller to allow the robot to behave more empathetically while recognizing and responding appropriately to the student's emotional state. It can also recall the student's past learning record and adapt its style of interaction accordingly. This allows the robot tutor to react to the student in a more sympathetic manner by delivering personalized verbal feedback synchronized with relevant gestures. Our study suggests the extent of this effect through the introduction of Engagement Vector Model which can be a benchmark for judging the quality of HRI experience. Quantitative and qualitative results demonstrate that such an empathetic responsive approach significantly improves student engagement and learning outcomes compared with a baseline humanoid robot without these human-like traits. This indicates that robot tutors with empathetic capabilities can create a more supportive, interactive learning experience that ultimately leads to better outcomes for the student.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1635419"},"PeriodicalIF":3.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating large language models for intuitive robot navigation.","authors":"Ziheng Xue, Arturs Elksnis, Ning Wang","doi":"10.3389/frobt.2025.1627937","DOIUrl":"10.3389/frobt.2025.1627937","url":null,"abstract":"<p><p>Home assistance robots face challenges in natural language interaction, object detection, and navigation, mainly when operating in resource-constrained home environments, which limits their practical deployment. In this study, we propose an AI agent framework based on Large Language Models (LLMs), which includes EnvNet, RoutePlanner, and AIBrain, to explore solutions for these issues. Utilizing quantized LLMs allows the system to operate on resource-limited devices while maintaining robust interaction capabilities. Our proposed method shows promising results in improving natural language understanding and navigation accuracy in home environments, also providing a valuable exploration for deploying home assistance robots.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1627937"},"PeriodicalIF":3.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Pimentel de Figueiredo, Christian Limberg, Lorenzo Jamone, Alexandre Bernardino
{"title":"Editorial: Computer vision mechanisms for resource-constrained robotics applications.","authors":"Rui Pimentel de Figueiredo, Christian Limberg, Lorenzo Jamone, Alexandre Bernardino","doi":"10.3389/frobt.2025.1680098","DOIUrl":"https://doi.org/10.3389/frobt.2025.1680098","url":null,"abstract":"","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1680098"},"PeriodicalIF":3.0,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Towards HRI of everyday life.","authors":"Karolina Zawieska, Mohammad Obaid, Wafa Johal","doi":"10.3389/frobt.2025.1657518","DOIUrl":"https://doi.org/10.3389/frobt.2025.1657518","url":null,"abstract":"","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1657518"},"PeriodicalIF":3.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12434301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tidiane Camaret Ndir, Robin T Schirrmeister, Tonio Ball
{"title":"EEG-CLIP: learning EEG representations from natural language descriptions.","authors":"Tidiane Camaret Ndir, Robin T Schirrmeister, Tonio Ball","doi":"10.3389/frobt.2025.1625731","DOIUrl":"10.3389/frobt.2025.1625731","url":null,"abstract":"<p><p>Deep networks for electroencephalogram (EEG) decoding are often only trained to solve one specific task, such as pathology or age decoding. A more general task-agnostic approach is to train deep networks to match a (clinical) EEG recording to its corresponding textual medical report and <i>vice versa</i>. This approach was pioneered in the computer vision domain matching images and their text captions and subsequently allowed to do successful zero-shot decoding using textual class prompts. In this work, we follow this approach and develop a contrastive learning framework, EEG-CLIP, that aligns the EEG time series and the descriptions of the corresponding clinical text in a shared embedding space. We investigated its potential for versatile EEG decoding, evaluating performance in a range of few-shot and zero-shot settings. Overall, we show that EEG-CLIP manages to non-trivially align text and EEG representations. Our work presents a promising approach to learn general EEG representations, which could enable easier analyses of diverse decoding questions through zero-shot decoding or training task-specific models from fewer training examples. The code for reproducing our results is available at https://github.com/tidiane-camaret/EEGClip.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1625731"},"PeriodicalIF":3.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145041848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal perception-driven decision-making for human-robot interaction: a survey.","authors":"Wenzheng Zhao, Kruthika Gangaraju, Fengpei Yuan","doi":"10.3389/frobt.2025.1604472","DOIUrl":"10.3389/frobt.2025.1604472","url":null,"abstract":"<p><p>Multimodal perception is essential for enabling robots to understand and interact with complex environments and human users by integrating diverse sensory data, such as vision, language, and tactile information. This capability plays a crucial role in decision-making in dynamic, complex environments. This survey provides a comprehensive review of advancements in multimodal perception and its integration with decision-making in robotics from year 2004-2024. We systematically summarize existing multimodal perception-driven decision-making (MPDDM) frameworks, highlighting their advantages in dynamic environments and the methodologies employed in human-robot interaction (HRI). Beyond reviewing these frameworks, we analyze key challenges in multimodal perception and decision-making, focusing on technical integration and sensor noise, adaptation, domain generalization, and safety and robustness. Finally, we outline future research directions, emphasizing the need for adaptive multimodal fusion techniques, more efficient learning paradigms, and human-trusted decision-making frameworks to advance the HRI field.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1604472"},"PeriodicalIF":3.0,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145015295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing for flourishing: a conceptual model for enhancing older adults' well-being with social robots.","authors":"Chantal Klier, Birgit Lugrin","doi":"10.3389/frobt.2025.1607373","DOIUrl":"10.3389/frobt.2025.1607373","url":null,"abstract":"<p><p>This article gives a new perspective on designing robotic applications in elderly care with a special focus on socially assistive robots and seniors' well-being. While various applications have been proposed there is currently no common conceptual model in designing interventions with social robots for seniors. Therefore, we propose a conceptual model that identifies five key domains for designing applications for socially interactive robots to enhance seniors' well-being. We base our conceptual model on established theories from the social sciences. Namely, we propose that application design should consider integrating Self-Determination Theory by addressing the three basic psychological needs (autonomy, competence, and relatedness) to enhance seniors' wellbeing. Furthermore, we recommend assessing the impact of social robots on well-being using the five building blocks of the PERMA framework: positive emotions, engagement, relationships, meaning, and accomplishment. By integrating these theoretical perspectives, researchers and developers gain a structured approach to designing social robot applications for cognitively healthy older adults and evaluating their effects.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1607373"},"PeriodicalIF":3.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12404932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}