Task-based dialogue policy learning based on diffusion models

IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhibin Liu, Rucai Pang, Zhaoan Dong
{"title":"Task-based dialogue policy learning based on diffusion models","authors":"Zhibin Liu,&nbsp;Rucai Pang,&nbsp;Zhaoan Dong","doi":"10.1007/s10489-024-05810-6","DOIUrl":null,"url":null,"abstract":"<div><p>The purpose of task-based dialogue systems is to help users achieve their dialogue needs using as few dialogue rounds as possible. As the demand increases, the dialogue tasks gradually involve multiple domains and develop in the direction of complexity and diversity. Achieving high performance with low computational effort has become an essential metric for multi-domain task-based dialogue systems. This paper proposes a new approach to guided dialogue policy. The method introduces a conditional diffusion model in the reinforcement learning Q-learning algorithm to regularise the policy in a diffusion Q-learning manner. The conditional diffusion model is used to learn the action value function, regulate the actions using regularisation, sample the actions, use the sampled actions in the policy update process, and additionally add a loss term that maximizes the value of the actions in the policy update process to improve the learning efficiency. Our proposed method is based on a conditional diffusion model, combined with the reinforcement learning TD3 algorithm as a dialogue policy and an inverse reinforcement learning approach to construct a reward estimator to provide rewards for policy updates as a way of completing a multi-domain dialogue task.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 22","pages":"11752 - 11764"},"PeriodicalIF":3.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05810-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The purpose of task-based dialogue systems is to help users achieve their dialogue needs using as few dialogue rounds as possible. As the demand increases, the dialogue tasks gradually involve multiple domains and develop in the direction of complexity and diversity. Achieving high performance with low computational effort has become an essential metric for multi-domain task-based dialogue systems. This paper proposes a new approach to guided dialogue policy. The method introduces a conditional diffusion model in the reinforcement learning Q-learning algorithm to regularise the policy in a diffusion Q-learning manner. The conditional diffusion model is used to learn the action value function, regulate the actions using regularisation, sample the actions, use the sampled actions in the policy update process, and additionally add a loss term that maximizes the value of the actions in the policy update process to improve the learning efficiency. Our proposed method is based on a conditional diffusion model, combined with the reinforcement learning TD3 algorithm as a dialogue policy and an inverse reinforcement learning approach to construct a reward estimator to provide rewards for policy updates as a way of completing a multi-domain dialogue task.

Abstract Image

Abstract Image

基于扩散模型的任务型对话政策学习
基于任务的对话系统的目的是帮助用户用尽可能少的对话回合来满足他们的对话需求。随着需求的增加,对话任务逐渐涉及多个领域,并向复杂性和多样性方向发展。以较小的计算量实现高性能,已成为基于多领域任务的对话系统的基本指标。本文提出了一种引导式对话策略的新方法。该方法在强化学习 Q-learning 算法中引入了条件扩散模型,以扩散 Q-learning 的方式对策略进行正则化。条件扩散模型用于学习动作值函数,利用正则化对动作进行调节,对动作进行采样,在策略更新过程中使用采样的动作,并在策略更新过程中添加一个使动作值最大化的损失项,以提高学习效率。我们提出的方法以条件扩散模型为基础,结合强化学习 TD3 算法作为对话策略,并采用逆强化学习方法构建奖励估计器,为策略更新提供奖励,以此完成多领域对话任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信