Conference on Robot Learning最新文献_第3页

HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration 从人类示范中学习的持续人机进化

Conference on Robot Learning Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04359

Xingyu Liu, Deepak Pathak, Kris M. Kitani

引用次数: 3

Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation 通过关注的模块化:机器人操作语言条件策略的有效训练和迁移

Conference on Robot Learning Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04573

Yifan Zhou, Shubham D. Sonawani, Mariano Phielipp, Simon Stepputtis, H. B. Amor

{"title":"Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation","authors":"Yifan Zhou, Shubham D. Sonawani, Mariano Phielipp, Simon Stepputtis, H. B. Amor","doi":"10.48550/arXiv.2212.04573","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04573","url":null,"abstract":"Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121548840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation 看、听和感觉:机器人操作的智能感官融合

Conference on Robot Learning Pub Date : 2022-12-07 DOI: 10.48550/arXiv.2212.03858

Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, E. Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu

引用次数: 12

Few-Shot Preference Learning for Human-in-the-Loop RL 人在环强化学习的少镜头偏好学习

Conference on Robot Learning Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03363

Joey Hejna, Dorsa Sadigh

{"title":"Few-Shot Preference Learning for Human-in-the-Loop RL","authors":"Joey Hejna, Dorsa Sadigh","doi":"10.48550/arXiv.2212.03363","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03363","url":null,"abstract":"While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at https://sites.google.com/view/few-shot-preference-rl/home.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"81 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114050662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Walk These Ways: Tuning Robot Control for Generalization with Multiplicity of Behavior 走这些路:具有行为多样性的机器人泛化调谐控制

Conference on Robot Learning Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03238

G. Margolis

引用次数: 29

Learning Representations that Enable Generalization in Assistive Tasks 辅助任务中实现泛化的学习表征

Conference on Robot Learning Pub Date : 2022-12-05 DOI: 10.48550/arXiv.2212.03175

Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory M. Erickson, A. Dragan

{"title":"Learning Representations that Enable Generalization in Assistive Tasks","authors":"Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory M. Erickson, A. Dragan","doi":"10.48550/arXiv.2212.03175","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03175","url":null,"abstract":"Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115143612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward 稀疏奖励下不匹配任务的强化学习演示

Conference on Robot Learning Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01509

Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen

引用次数: 0

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula 通过零射击课程嵌入自动驾驶的综合非政策经验

Conference on Robot Learning Pub Date : 2022-12-02 DOI: 10.48550/arXiv.2212.01375

Eli Bronstein, S. Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson

{"title":"Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula","authors":"Eli Bronstein, S. Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson","doi":"10.48550/arXiv.2212.01375","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01375","url":null,"abstract":"ML-based motion planning is a promising approach to produce agents that exhibit complex behaviors, and automatically adapt to novel environments. In the context of autonomous driving, it is common to treat all available training data equally. However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset. We present a method to predict the inherent difficulty of a driving situation given data collected from a fleet of autonomous vehicles deployed on public roads. We then demonstrate that this difficulty score can be used in a zero-shot transfer to generate curricula for an imitation-learning based planning agent. Compared to training on the entire unbiased training dataset, we show that prioritizing difficult driving scenarios both reduces collisions by 15% and increases route adherence by 14% in closed-loop evaluation, all while using only 10% of the training data.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129335154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Proactive Robot Assistance via Spatio-Temporal Object Modeling 基于时空对象建模的主动机器人辅助

Conference on Robot Learning Pub Date : 2022-11-28 DOI: 10.48550/arXiv.2211.15501

Maithili Patel, S. Chernova

引用次数: 7

Learning Bimanual Scooping Policies for Food Acquisition 学习双手舀取食物的策略

Conference on Robot Learning Pub Date : 2022-11-26 DOI: 10.48550/arXiv.2211.14652

J. Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh

{"title":"Learning Bimanual Scooping Policies for Food Acquisition","authors":"J. Grannen, Yilin Wu, Suneel Belkhale, Dorsa Sadigh","doi":"10.48550/arXiv.2211.14652","DOIUrl":"https://doi.org/10.48550/arXiv.2211.14652","url":null,"abstract":"A robotic feeding system must be able to acquire a variety of foods. Prior bite acquisition works consider single-arm spoon scooping or fork skewering, which do not generalize to foods with complex geometries and deformabilities. For example, when acquiring a group of peas, skewering could smoosh the peas while scooping without a barrier could result in chasing the peas on the plate. In order to acquire foods with such diverse properties, we propose stabilizing food items during scooping using a second arm, for example, by pushing peas against the spoon with a flat surface to prevent dispersion. The added stabilizing arm can lead to new challenges. Critically, this arm should stabilize the food scene without interfering with the acquisition motion, which is especially difficult for easily breakable high-risk food items like tofu. These high-risk foods can break between the pusher and spoon during scooping, which can lead to food waste falling out of the spoon. We propose a general bimanual scooping primitive and an adaptive stabilization strategy that enables successful acquisition of a diverse set of food geometries and physical properties. Our approach, CARBS: Coordinated Acquisition with Reactive Bimanual Scooping, learns to stabilize without impeding task progress by identifying high-risk foods and robustly scooping them using closed-loop visual feedback. We find that CARBS is able to generalize across food shape, size, and deformability and is additionally able to manipulate multiple food items simultaneously. CARBS achieves 87.0% success on scooping rigid foods, which is 25.8% more successful than a single-arm baseline, and reduces food breakage by 16.2% compared to an analytical baseline. Videos can be found at https://sites.google.com/view/bimanualscoop-corl22/home .","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121081586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7