Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu
{"title":"Adaptive Language-Guided Abstraction from Contrastive Explanations","authors":"Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu","doi":"arxiv-2409.08212","DOIUrl":null,"url":null,"abstract":"Many approaches to robot learning begin by inferring a reward function from a\nset of human demonstrations. To learn a good reward, it is necessary to\ndetermine which features of the environment are relevant before determining how\nthese features should be used to compute reward. End-to-end methods for joint\nfeature and reward learning (e.g., using deep networks or program synthesis\ntechniques) often yield brittle reward functions that are sensitive to spurious\nstate features. By contrast, humans can often generalizably learn from a small\nnumber of demonstrations by incorporating strong priors about what features of\na demonstration are likely meaningful for a task of interest. How do we build\nrobots that leverage this kind of background knowledge when learning from new\ndemonstrations? This paper describes a method named ALGAE (Adaptive\nLanguage-Guided Abstraction from [Contrastive] Explanations) which alternates\nbetween using language models to iteratively identify human-meaningful features\nneeded to explain demonstrated behavior, then standard inverse reinforcement\nlearning techniques to assign weights to these features. Experiments across a\nvariety of both simulated and real-world robot environments show that ALGAE\nlearns generalizable reward functions defined on interpretable features using\nonly small numbers of demonstrations. Importantly, ALGAE can recognize when\nfeatures are missing, then extract and define those features without any human\ninput -- making it possible to quickly and efficiently acquire rich\nrepresentations of user behavior.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many approaches to robot learning begin by inferring a reward function from a
set of human demonstrations. To learn a good reward, it is necessary to
determine which features of the environment are relevant before determining how
these features should be used to compute reward. End-to-end methods for joint
feature and reward learning (e.g., using deep networks or program synthesis
techniques) often yield brittle reward functions that are sensitive to spurious
state features. By contrast, humans can often generalizably learn from a small
number of demonstrations by incorporating strong priors about what features of
a demonstration are likely meaningful for a task of interest. How do we build
robots that leverage this kind of background knowledge when learning from new
demonstrations? This paper describes a method named ALGAE (Adaptive
Language-Guided Abstraction from [Contrastive] Explanations) which alternates
between using language models to iteratively identify human-meaningful features
needed to explain demonstrated behavior, then standard inverse reinforcement
learning techniques to assign weights to these features. Experiments across a
variety of both simulated and real-world robot environments show that ALGAE
learns generalizable reward functions defined on interpretable features using
only small numbers of demonstrations. Importantly, ALGAE can recognize when
features are missing, then extract and define those features without any human
input -- making it possible to quickly and efficiently acquire rich
representations of user behavior.