Generalization of temporal logic tasks via future dependent options

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Duo Xu, Faramarz Fekri
{"title":"Generalization of temporal logic tasks via future dependent options","authors":"Duo Xu, Faramarz Fekri","doi":"10.1007/s10994-024-06614-y","DOIUrl":null,"url":null,"abstract":"<p>Temporal logic (TL) tasks consist of complex and temporally extended subgoals and they are common for many real-world applications, such as service and navigation robots. However, it is often inefficient or even infeasible to train reinforcement learning (RL) agents to solve multiple TL tasks, since rewards are sparse and non-Markovian in these tasks. A promising solution to this problem is to learn task-conditioned policies which can zero-shot generalize to new TL tasks without further training. However, influenced by some practical issues, such as issues of lossy symbolic observation and long time-horizon of completing TL task, previous works suffer from sample inefficiency in training and sub-optimality (or even infeasibility) in task execution. In order to tackle these issues, this paper proposes an option-based framework to generalize TL tasks, consisting of option training and task execution parts. We have innovations in both parts. In option training, we propose to learn options dependent on the future subgoals via a novel approach. Additionally, we propose to train a multi-step value function which can propagate the rewards of satisfying future subgoals more efficiently in long-horizon tasks. In task execution, in order to ensure the optimality and safety, we propose a model-free MPC planner for option selection, circumventing the learning of a transition model which is required by previous MPC planners. In experiments on three different domains, we evaluate the generalization capability of the agent trained by the proposed method, showing its significant advantage over previous methods.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06614-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Temporal logic (TL) tasks consist of complex and temporally extended subgoals and they are common for many real-world applications, such as service and navigation robots. However, it is often inefficient or even infeasible to train reinforcement learning (RL) agents to solve multiple TL tasks, since rewards are sparse and non-Markovian in these tasks. A promising solution to this problem is to learn task-conditioned policies which can zero-shot generalize to new TL tasks without further training. However, influenced by some practical issues, such as issues of lossy symbolic observation and long time-horizon of completing TL task, previous works suffer from sample inefficiency in training and sub-optimality (or even infeasibility) in task execution. In order to tackle these issues, this paper proposes an option-based framework to generalize TL tasks, consisting of option training and task execution parts. We have innovations in both parts. In option training, we propose to learn options dependent on the future subgoals via a novel approach. Additionally, we propose to train a multi-step value function which can propagate the rewards of satisfying future subgoals more efficiently in long-horizon tasks. In task execution, in order to ensure the optimality and safety, we propose a model-free MPC planner for option selection, circumventing the learning of a transition model which is required by previous MPC planners. In experiments on three different domains, we evaluate the generalization capability of the agent trained by the proposed method, showing its significant advantage over previous methods.

Abstract Image

通过依赖未来的选项实现时间逻辑任务的通用化
时间逻辑(TL)任务由复杂的时间扩展子目标组成,在服务和导航机器人等许多现实世界的应用中很常见。然而,训练强化学习(RL)代理来解决多个 TL 任务往往效率低下,甚至不可行,因为在这些任务中,奖励是稀疏和非马尔可夫的。解决这一问题的一个很有前景的办法是学习任务条件策略,这种策略无需进一步训练就能对新的 TL 任务进行零点泛化。然而,受一些实际问题的影响,例如有损符号观测和完成 TL 任务的时间跨度较长等问题,以前的工作在训练中存在样本效率低下的问题,在任务执行中存在次优性(甚至不可行性)。为了解决这些问题,本文提出了一个基于选项的框架来泛化 TL 任务,由选项训练和任务执行两部分组成。我们在这两部分都有所创新。在选项训练中,我们建议通过一种新颖的方法来学习与未来子目标相关的选项。此外,我们还建议训练一个多步骤价值函数,它可以在长视距任务中更有效地传播满足未来子目标的奖励。在任务执行过程中,为了确保最优性和安全性,我们提出了一种用于选项选择的无模型 MPC 计划程序,避免了以往 MPC 计划程序所要求的过渡模型学习。在三个不同领域的实验中,我们评估了用所提方法训练的代理的泛化能力,结果表明它比以前的方法有显著优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Learning
Machine Learning 工程技术-计算机:人工智能
CiteScore
11.00
自引率
2.70%
发文量
162
审稿时长
3 months
期刊介绍: Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信