Sabrina Hoppe, Markus Giftthaler, R. Krug, Marc Toussaint
{"title":"Stabilizing deep Q-learning with Q-graph-based bounds","authors":"Sabrina Hoppe, Markus Giftthaler, R. Krug, Marc Toussaint","doi":"10.1177/02783649231185165","DOIUrl":"https://doi.org/10.1177/02783649231185165","url":null,"abstract":"State-of-the art deep reinforcement learning has enabled autonomous agents to learn complex strategies from scratch on many problems including continuous control tasks. Deep Q-networks (DQN) and deep deterministic policy gradients (DDPGs) are two such algorithms which are both based on Q-learning. They therefore all share function approximation, off-policy behavior, and bootstrapping—the constituents of the so-called deadly triad that is known for its convergence issues. We suggest to take a graph perspective on the data an agent has collected and show that the structure of this data graph is linked to the degree of divergence that can be expected. We further demonstrate that a subset of states and actions from the data graph can be selected such that the resulting finite graph can be interpreted as a simplified Markov decision process (MDP) for which the Q-values can be computed analytically. These Q-values are lower bounds for the Q-values in the original problem, and enforcing these bounds in temporal difference learning can help to prevent soft divergence. We show further effects on a simulated continuous control task, including improved sample efficiency, increased robustness toward hyperparameters as well as a better ability to cope with limited replay memory. Finally, we demonstrate the benefits of our method on a large robotic benchmark with an industrial assembly task and approximately 60 h of real-world interaction.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"633 - 654"},"PeriodicalIF":9.2,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46535776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robotic drilling for the Chinese Chang’E 5 lunar sample-return mission","authors":"Zhang Tao, Yong Pang, Ting Zeng, Guoxing Wang, Shen Yin, Kun Xu, Guidong Mo, Xingwang Zhang, Lusi Wang, Shuai Yang, Zengzeng Zhao, Junjie Qin, Junshan Gong, Zhongxiang Zhao, Xuefeng Tong, Zhongwang Yin, Haiyuan Wang, Fan Zhao, Yanhong Zheng, Xiangjin Deng, Bin Wang, Jinchang Xu, Wei Wang, Shuangfei Yu, Xiaoming Lai, Xilun Ding","doi":"10.1177/02783649231187918","DOIUrl":"https://doi.org/10.1177/02783649231187918","url":null,"abstract":"On December 2, 2020, a 2-m class robotic drill onboard the Chinese Chang’E 5 lunar lander successfully penetrated 1 m into the lunar regolith and collected 259.72 g of samples. This paper presents the design and development, terrestrial tests, and lunar sampling results of the robotic drill. First, the system design of the robotic drill, including its engineering objectives, drill configuration, drilling and coring methods, and rotational speed determination, was studied. Subsequently, a control strategy was proposed to address the geological uncertainty and complexity of the lunar surface. Terrestrial tests were conducted to assess the sampling performance of the robotic drill under both atmospheric and vacuum conditions. Finally, the results of drilling on the lunar surface were obtained, and the complex geological conditions encountered were analyzed. The success of the Chinese Chang’E 5 lunar sample-return mission demonstrates the feasibility of the proposed robotic drill. This study can serve as an important reference for future extraterrestrial robotic regolith-sampling missions.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"586 - 613"},"PeriodicalIF":9.2,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41882942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Haas, Selim Solmaz, Jakob Reckenzaun, Simon Genser
{"title":"ViF-GTAD: A new automotive dataset with ground truth for ADAS/AD development, testing, and validation","authors":"Sarah Haas, Selim Solmaz, Jakob Reckenzaun, Simon Genser","doi":"10.1177/02783649231188146","DOIUrl":"https://doi.org/10.1177/02783649231188146","url":null,"abstract":"A new dataset for automated driving, which is the subject matter of this paper, identifies and addresses a gap in existing similar perception datasets. While most state-of-the-art perception datasets primarily focus on the provision of various onboard sensor measurements along with the semantic information under various driving conditions, the provided information is often insufficient since the object list and position data provided include unknown and time-varying errors. The current paper and the associated dataset describe the first publicly available perception measurement data that include not only the onboard sensor information from the camera, Lidar, and radar with semantically classified objects but also the high-precision ground-truth position measurements enabled by the accurate RTK-assisted GPS localization systems available on both the ego vehicle and the dynamic target objects. This paper provides insight on the capturing of the data, explicitly explaining the metadata structure and the content, as well as the potential application examples where it has been, and can potentially be, applied and implemented in relation to automated driving and environmental perception systems development, testing, and validation.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"614 - 630"},"PeriodicalIF":9.2,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43905647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Robotic Information Gathering via non-stationary Gaussian processes","authors":"Weizhe Chen, Roni Khardon, Lantao Liu","doi":"10.1177/02783649231184498","DOIUrl":"https://doi.org/10.1177/02783649231184498","url":null,"abstract":"Robotic Information Gathering (RIG) is a foundational research topic that answers how a robot (team) collects informative data to efficiently build an accurate model of an unknown target function under robot embodiment constraints. RIG has many applications, including but not limited to autonomous exploration and mapping, 3D reconstruction or inspection, search and rescue, and environmental monitoring. A RIG system relies on a probabilistic model’s prediction uncertainty to identify critical areas for informative data collection. Gaussian processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data is typically non-stationary—different locations do not have the same degree of variability. As a result, the prediction uncertainty does not accurately reveal prediction error, limiting the success of RIG algorithms. We propose a family of non-stationary kernels named Attentive Kernel (AK), which is simple and robust and can extend any existing kernel to a non-stationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used stationary kernels and the leading non-stationary kernels. The improved uncertainty quantification guides the downstream informative planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with significant spatial variations, enabling the model to characterize salient environmental features.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135454366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julen Urain, Anqi Li, Puze Liu, Carlo D’Eramo, Jan Peters
{"title":"Composable energy policies for reactive motion generation and reinforcement learning","authors":"Julen Urain, Anqi Li, Puze Liu, Carlo D’Eramo, Jan Peters","doi":"10.1177/02783649231179499","DOIUrl":"https://doi.org/10.1177/02783649231179499","url":null,"abstract":"In this work, we introduce composable energy policies (CEP), a novel framework for multi-objective motion generation. We frame the problem of composing multiple policy components from a probabilistic view. We consider a set of stochastic policies represented in arbitrary task spaces, where each policy represents a distribution of the actions to solve a particular task. Then, we aim to find the action in the configuration space that optimally satisfies all the policy components. The presented framework allows the fusion of motion generators from different sources: optimal control, data-driven policies, motion planning, and handcrafted policies. Classically, the problem of multi-objective motion generation is solved by the composition of a set of deterministic policies, rather than stochastic policies. However, there are common situations where different policy components have conflicting behaviors, leading to oscillations or the robot getting stuck in an undesirable state. While our approach is not directly able to solve the conflicting policies problem, we claim that modeling each policy as a stochastic policy allows more expressive representations for each component in contrast with the classical reactive motion generation approaches. In some tasks, such as reaching a target in a cluttered environment, we show experimentally that CEP additional expressivity allows us to model policies that reduce these conflicting behaviors. A field that benefits from these reactive motion generators is the one of robot reinforcement learning. Integrating these policy architectures with reinforcement learning allows us to include a set of inductive biases in the learning problem. These inductive biases guide the reinforcement learning agent towards informative regions or improve collision safety while exploring. In our work, we show how to integrate our proposed reactive motion generator as a structured policy for reinforcement learning. Combining the reinforcement learning agent exploration with the prior-based CEP, we can improve the learning performance and explore safer.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135607974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimizing running buffers for tabletop object rearrangement: Complexity, fast algorithms, and applications","authors":"Kai Gao, Si Wei Feng, Baichuan Huang, Jingjin Yu","doi":"10.1177/02783649231178565","DOIUrl":"https://doi.org/10.1177/02783649231178565","url":null,"abstract":"For rearranging objects on tabletops with overhand grasps, temporarily relocating objects to some buffer space may be necessary. This raises the natural question of how many simultaneous storage spaces, or “running buffers,” are required so that certain classes of tabletop rearrangement problems are feasible. In this work, we examine the problem for both labeled and unlabeled settings. On the structural side, we observe that finding the minimum number of running buffers (MRB) can be carried out on a dependency graph abstracted from a problem instance and show that computing MRB is NP-hard. We then prove that under both labeled and unlabeled settings, even for uniform cylindrical objects, the number of required running buffers may grow unbounded as the number of objects to be rearranged increases. We further show that the bound for the unlabeled case is tight. On the algorithmic side, we develop effective exact algorithms for finding MRB for both labeled and unlabeled tabletop rearrangement problems, scalable to over a hundred objects under very high object density. More importantly, our algorithms also compute a sequence witnessing the computed MRB that can be used for solving object rearrangement tasks. Employing these algorithms, empirical evaluations reveal that random labeled and unlabeled instances, which more closely mimic real-world setups generally have fairly small MRBs. Using real robot experiments, we demonstrate that the running buffer abstraction leads to state-of-the-art solutions for the in-place rearrangement of many objects in a tight, bounded workspace.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135268539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spencer M. Richards, Navid Azizan, Jean-Jacques Slotine, Marco Pavone
{"title":"Control-oriented meta-learning","authors":"Spencer M. Richards, Navid Azizan, Jean-Jacques Slotine, Marco Pavone","doi":"10.1177/02783649231165085","DOIUrl":"https://doi.org/10.1177/02783649231165085","url":null,"abstract":"Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135403611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Hernandez Sanchez, Walid Amanhoud, A. Billard, M. Bouri
{"title":"Enabling four-arm laparoscopic surgery by controlling two robotic assistants via haptic foot interfaces","authors":"Jacob Hernandez Sanchez, Walid Amanhoud, A. Billard, M. Bouri","doi":"10.1177/02783649231180366","DOIUrl":"https://doi.org/10.1177/02783649231180366","url":null,"abstract":"Robotic surgery is a promising direction to improve surgeons and assistants’ daily life with respect to conventional surgery. In this work, we propose solo laparoscopic surgery in which two robotic arms, controlled via haptic foot interfaces, assist the task of the hands. Such a system opens the door for simultaneous control of four laparoscopic tools by the same user. Each hand controls a manipulative tool while a foot controls an endoscope/camera and another controls an actuated gripper. In this scenario, the surgeon and robots need to work collaboratively within a concurrent workspace, while meeting the precision demands of surgery. To this end, we propose a control framework for the robotic arms that deals with all the task- and safety-related constraints. Furthermore, to ease the control through the feet, two assistance modalities are proposed: adaptive visual tracking of the laparoscopic instruments with the camera and grasping assistance for the gripper. A user study is conducted on twelve subjects to highlight the ease of use of the system and to evaluate the relevance of the proposed shared control strategies. The results confirm the feasibility of four-arm surgical-like tasks without extensive training in tasks that involve visual-tracking and manipulation goals for the feet, as well as coordination with both hands. Moreover, our study characterizes and motivates the use of robotic assistance for reducing task load, improving performance, increasing fluency, and eliciting higher coordination during four-arm laparoscopic tasks.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"475 - 503"},"PeriodicalIF":9.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41513852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated planning and control of robotic surgical instruments for task autonomy","authors":"Fangxun Zhong, Yun-hui Liu","doi":"10.1177/02783649231179753","DOIUrl":"https://doi.org/10.1177/02783649231179753","url":null,"abstract":"Agile maneuvers are essential for robot-enabled complex tasks such as surgical procedures. Prior explorations on surgery autonomy are limited to feasibility study of completing a single task without systematically addressing generic manipulation safety across different tasks. We present an integrated planning and control framework for 6-DoF robotic instruments for pipeline automation of surgical tasks. We leverage the geometry of a robotic instrument and propose the nodal state space to represent the robot state in SE(3) space. Each elementary robot motion could be encoded by regulation of the state parameters via a dynamical system. This theoretically ensures that every in-process trajectory is globally feasible and stably reached to an admissible target, and the controller is of closed-form without computing 6-DoF inverse kinematics. Then, to plan the motion steps reliably, we propose an interactive (instant) goal state of the robot that transforms manipulation planning through desired path constraints into a goal-varying manipulation (GVM) problem. We detail how GVM could adaptively and smoothly plan the procedure (could proceed or rewind the process as needed) based on on-the-fly situations under dynamic or disturbed environment. Finally, we extend the above policy to characterize complete pipelines of various surgical tasks. Simulations show that our framework could smoothly solve twisted maneuvers while avoiding collisions. Physical experiments using the da Vinci Research Kit validates the capability of automating individual tasks including tissue debridement, dissection, and wound suturing. The results confirm good task-level consistency and reliability compared to state-of-the-art automation algorithms.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"504 - 536"},"PeriodicalIF":9.2,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42888141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clémentin Boittiaux, C. Dune, Maxime Ferrera, A. Arnaubec, R. Marxer, M. Matabos, Loïc Van Audenhaege, Vincent Hugel
{"title":"Eiffel Tower: A deep-sea underwater dataset for long-term visual localization","authors":"Clémentin Boittiaux, C. Dune, Maxime Ferrera, A. Arnaubec, R. Marxer, M. Matabos, Loïc Van Audenhaege, Vincent Hugel","doi":"10.1177/02783649231177322","DOIUrl":"https://doi.org/10.1177/02783649231177322","url":null,"abstract":"Visual localization plays an important role in the positioning and navigation of robotics systems within previously visited environments. When visits occur over long periods of time, changes in the environment related to seasons or day-night cycles present a major challenge. Under water, the sources of variability are due to other factors such as water conditions or growth of marine organisms. Yet, it remains a major obstacle and a much less studied one, partly due to the lack of data. This paper presents a new deep-sea dataset to benchmark underwater long-term visual localization. The dataset is composed of images from four visits to the same hydrothermal vent edifice over the course of 5 years. Camera poses and a common geometry of the scene were estimated using navigation data and Structure-from-Motion. This serves as a reference when evaluating visual localization techniques. An analysis of the data provides insights about the major changes observed throughout the years. Furthermore, several well-established visual localization methods are evaluated on the dataset, showing there is still room for improvement in underwater long-term visual localization. The data is made publicly available at seanoe.org/data/00810/92226/.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"689 - 699"},"PeriodicalIF":9.2,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46759361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}