{"title":"Learning dexterity from human hand motion in internet videos","authors":"Kenneth Shaw, Shikhar Bahl, Aravind Sivakumar, Aditya Kannan, Deepak Pathak","doi":"10.1177/02783649241227559","DOIUrl":"https://doi.org/10.1177/02783649241227559","url":null,"abstract":"To build general robotic agents that can operate in many environments, it is often useful for robots to collect experience in the real world. However, unguided experience collection is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real world experience: videos of humans using their hands. To utilize these videos, we develop a method that retargets any 1st person or 3rd person video of human hands and arms into the robot hand and arm trajectories. While retargeting is a difficult problem, our key insight is to rely on only internet human hand video to train it. We use this method to present results in two areas: First, we build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. This enables the robot to collect real-world experience safely using supervision. See these results at https://robotic-telekinesis.github.io . Second, we retarget in-the-wild human internet video into task-conditioned pseudo-robot trajectories to use as artificial robot experience. This learning algorithm leverages action priors from human hand actions, visual features from the images, and physical priors from dynamical systems to pretrain typical human behavior for a particular robot task. We show that by leveraging internet human hand experience, we need fewer robot demonstrations compared to many other methods. See these results at https://video-dex.github.io","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139608207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu
{"title":"Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains","authors":"Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu","doi":"10.1177/02783649231225977","DOIUrl":"https://doi.org/10.1177/02783649231225977","url":null,"abstract":"We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"93 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139612708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cross-domain challenge with panoptic segmentation in agriculture","authors":"Michael Halstead, Patrick Zimmer, Chris McCool","doi":"10.1177/02783649241227448","DOIUrl":"https://doi.org/10.1177/02783649241227448","url":null,"abstract":"Automation in agriculture is a growing area of research with fundamental societal importance as farmers are expected to produce more and better crop with fewer resources. A key enabling factor is robotic vision techniques allowing us to sense and then interact with the environment. A limiting factor for these robotic vision systems is their cross-domain performance, that is, their ability to operate in a large range of environments. In this paper, we propose the use of auxiliary tasks to enhance cross-domain performance without the need for extra data. We perform experiments using four datasets (two in a glasshouse and two in arable farmland) for four cross-domain evaluations. These experiments demonstrate the effectiveness of our auxiliary tasks to improve network generalisability. In glasshouse experiments, our approach improves the panoptic quality of things from 10.4 to 18.5 and in arable farmland from 16.0 to 27.5; where a score of 100 is the best. To further evaluate the generalisability of our approach, we perform an ablation study using the large Crop and Weed dataset (CAW) where we improve cross-domain performance (panoptic quality of things) from 12.8 to 30.6 for the CAW dataset to our novel WeedAI dataset, and 21.2 to 36.0 from CAW to the other arable farmland dataset. Although our proposed approaches considerably improve cross-domain performance we still do not generally outperform in-domain trained systems. This highlights the potential room for improvement in this area and the importance of cross-domain research for robotic vision systems.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"120 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations","authors":"Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab","doi":"10.1177/02783649231223547","DOIUrl":"https://doi.org/10.1177/02783649231223547","url":null,"abstract":"Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously “explore” target anatomies and navigate a US probe to standard planes by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparison approach in a self-supervised fashion. This process can be referred to as understanding the “language of sonography.” Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms (“line” target), two types of ex vivo animal organ phantoms (chicken heart and lamb kidney representing “point” target), and in vivo human carotids. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in vivo human carotid data. Code: https://github.com/yuan-12138/MI-GPSR . Video: https://youtu.be/u4ThAA9onE0 .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mitchell Jones, Maximilian Haas-Heger, Jur van den Berg
{"title":"Lane-level route planning for autonomous vehicles","authors":"Mitchell Jones, Maximilian Haas-Heger, Jur van den Berg","doi":"10.1177/02783649231225474","DOIUrl":"https://doi.org/10.1177/02783649231225474","url":null,"abstract":"We present an algorithm that, given a representation of a road network in lane-level detail, computes a route that minimizes the expected cost to reach a given destination. In doing so, our algorithm allows us to solve for the complex trade-offs encountered when trying to decide not just which roads to follow, but also when to change between the lanes making up these roads, in order to—for example—reduce the likelihood of missing a left exit while not unnecessarily driving in the leftmost lane. This routing problem can naturally be formulated as a Markov Decision Process (MDP), in which lane change actions have stochastic outcomes. However, MDPs are known to be time-consuming to solve in general. In this paper, we show that—under reasonable assumptions—we can use a Dijkstra-like approach to solve this stochastic problem, and benefit from its efficient O( n log n) running time. This enables an autonomous vehicle to exhibit lane-selection behavior as it efficiently plans an optimal route to its destination.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt
{"title":"Leveraging symmetries in pick and place","authors":"Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt","doi":"10.1177/02783649231225775","DOIUrl":"https://doi.org/10.1177/02783649231225775","url":null,"abstract":"Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang
{"title":"Self-reflective terrain-aware robot adaptation for consistent off-road ground navigation","authors":"Sriram Siva, Maggie Wigness, John G. Rogers, Long Quang, Hao Zhang","doi":"10.1177/02783649231225243","DOIUrl":"https://doi.org/10.1177/02783649231225243","url":null,"abstract":"Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal
{"title":"Rapid locomotion via reinforcement learning","authors":"Gabriel B. Margolis, Ge Yang, Kartik Paigwar, Tao Chen, Pulkit Agrawal","doi":"10.1177/02783649231224053","DOIUrl":"https://doi.org/10.1177/02783649231224053","url":null,"abstract":"Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer. Videos of the robot’s behaviors are available at https://agility.csail.mit.edu/ .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139945624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-optimal trajectories for skid-steer rovers","authors":"M. Effati, Krzysztof Skonieczny, Devin J. Balkcom","doi":"10.1177/02783649231216499","DOIUrl":"https://doi.org/10.1177/02783649231216499","url":null,"abstract":"This paper presents the energy-optimal trajectories for skid-steer rovers on hard ground, without obstacles. We obtain 29 trajectory structures that are sufficient to describe minimum-energy motion, which are enumerated and described geometrically; 28 of these structures are composed of sequences of circular arcs and straight lines; there is also a special structure called whirls consisting of different circular arcs. Our analysis identifies that the turns in the trajectory structures (aside from whirls) are all circular arcs of a particular turning radius, R′, the turning radius at which the inner wheels of a skid-steer rover are not commanded to turn. This work demonstrates its paramount importance in energy-optimal path planning. There has been a lack of analytical energy-optimal trajectory generation for skid-steer rovers, and we address this problem by a novel approach. The equivalency theorem presented in this work shows that all minimum-energy solutions follow the same path irrespective of velocity constraints that may or may not be imposed. This non-intuitive result stems from the fact that with this model of the system the total energy is fully parameterized by the geometry of the path alone. With this equivalency in mind, one can choose velocity constraints to enforce constant power consumption, thus transforming the energy-optimal problem into an equivalent time-optimal problem. Pontryagin’s Minimum Principle can then be used to solve the problem. Accordingly, the extremal paths are obtained and enumerated to find the minimum-energy path. Furthermore, our experimental results by using Husky UGV provide the experimental support for the equivalency theorem.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":" 33","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139141432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CID-SIMS: Complex indoor dataset with semantic information and multi-sensor data from a ground wheeled robot viewpoint","authors":"Yidi Zhang, Ning An, Chenhui Shi, Shuo Wang, Hao Wei, Pengju Zhang, Xinrui Meng, Zengpeng Sun, Jinke Wang, Wenliang Liang, Fulin Tang, Yihong Wu","doi":"10.1177/02783649231222507","DOIUrl":"https://doi.org/10.1177/02783649231222507","url":null,"abstract":"Simultaneous localization and mapping (SLAM) and 3D reconstruction have numerous applications for indoor ground wheeled robots such as floor sweeping and food delivery. To advance research in leveraging semantic information and multi-sensor data to enhance the performances of SLAM and 3D reconstruction in complex indoor scenes, we propose a novel and complex indoor dataset named CID-SIMS, where semantic annotated RGBD images, inertial measurement unit (IMU) measurements, and wheel odometer data are provided from a ground wheeled robot viewpoint. The dataset consists of 22 challenging sequences captured in nine different scenes including office building and apartment environments. Notably, our dataset achieves two significant breakthroughs. Firstly, semantic information and multi-sensor data are provided meanwhile for the first time. Secondly, GeoSLAM is utilized for the first time to generate ground truth trajectories and 3D point clouds within two-centimeter accuracy. With spatial-temporal synchronous ground truth trajectories and 3D point clouds, our dataset is capable of evaluating SLAM and 3D reconstruction algorithms in a unified global coordinate system. We evaluate state-of-the-art SLAM and 3D reconstruction approaches on our dataset, demonstrating that our benchmark is applicable. The dataset is publicly available on https://cid-sims.github.io .","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"59 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}