DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning

Mingxin Zhang, T. Shinozaki
{"title":"DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning","authors":"Mingxin Zhang, T. Shinozaki","doi":"10.23919/APSIPAASC55919.2022.9980344","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is a powerful strategy for making a flexible task-oriented dialog agent, but it is weak in learning speed. Deep Dyna-Q augments the agent's experience to improve the learning efficiency by internally simulating the user's behavior. It uses a deep neural network (DNN) based learnable user model to predict user action, reward, and dialog termination from the dialog state and the agent's action. However, it still needs many agent-user interactions to train the user model. We propose a DNN-Rule hybrid user model for Dyna-Q, where the DNN only simulates the user action. Instead, a rule-based function infers the reward and the dialog termination. We also investigate the training with rollout to further enhance the learning efficiency. Experiments on a movie-ticket booking task demonstrate that our approach significantly improves learning efficiency.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning (RL) is a powerful strategy for making a flexible task-oriented dialog agent, but it is weak in learning speed. Deep Dyna-Q augments the agent's experience to improve the learning efficiency by internally simulating the user's behavior. It uses a deep neural network (DNN) based learnable user model to predict user action, reward, and dialog termination from the dialog state and the agent's action. However, it still needs many agent-user interactions to train the user model. We propose a DNN-Rule hybrid user model for Dyna-Q, where the DNN only simulates the user action. Instead, a rule-based function infers the reward and the dialog termination. We also investigate the training with rollout to further enhance the learning efficiency. Experiments on a movie-ticket booking task demonstrate that our approach significantly improves learning efficiency.
面向样本高效任务的对话策略学习的dnn -规则混合Dyna-Q
强化学习(RL)是制造灵活的面向任务的对话代理的有力策略,但它在学习速度上较弱。Deep dynamic - q通过内部模拟用户的行为来增强智能体的经验,从而提高学习效率。它使用基于深度神经网络(DNN)的可学习用户模型,从对话状态和智能体的动作来预测用户的动作、奖励和对话终止。然而,它仍然需要许多代理-用户交互来训练用户模型。我们提出了Dyna-Q的DNN- rule混合用户模型,其中DNN只模拟用户的动作。相反,基于规则的函数推断奖励和对话终止。我们还研究了带rollout的培训,以进一步提高学习效率。在一个电影票预订任务上的实验表明,我们的方法显著提高了学习效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信