DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9980344

Mingxin Zhang, T. Shinozaki

引用次数: 0

Abstract

Reinforcement learning (RL) is a powerful strategy for making a flexible task-oriented dialog agent, but it is weak in learning speed. Deep Dyna-Q augments the agent's experience to improve the learning efficiency by internally simulating the user's behavior. It uses a deep neural network (DNN) based learnable user model to predict user action, reward, and dialog termination from the dialog state and the agent's action. However, it still needs many agent-user interactions to train the user model. We propose a DNN-Rule hybrid user model for Dyna-Q, where the DNN only simulates the user action. Instead, a rule-based function infers the reward and the dialog termination. We also investigate the training with rollout to further enhance the learning efficiency. Experiments on a movie-ticket booking task demonstrate that our approach significantly improves learning efficiency.

查看原文本刊更多论文

面向样本高效任务的对话策略学习的dnn -规则混合Dyna-Q

强化学习(RL)是制造灵活的面向任务的对话代理的有力策略，但它在学习速度上较弱。Deep dynamic - q通过内部模拟用户的行为来增强智能体的经验，从而提高学习效率。它使用基于深度神经网络(DNN)的可学习用户模型，从对话状态和智能体的动作来预测用户的动作、奖励和对话终止。然而，它仍然需要许多代理-用户交互来训练用户模型。我们提出了Dyna-Q的DNN- rule混合用户模型，其中DNN只模拟用户的动作。相反，基于规则的函数推断奖励和对话终止。我们还研究了带rollout的培训，以进一步提高学习效率。在一个电影票预订任务上的实验表明，我们的方法显著提高了学习效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量