Iterative policy learning in end-to-end trainable task-oriented neural dialog models

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-18 DOI:10.1109/ASRU.2017.8268975

Bing Liu, Ian Lane

{"title":"Iterative policy learning in end-to-end trainable task-oriented neural dialog models","authors":"Bing Liu, Ian Lane","doi":"10.1109/ASRU.2017.8268975","DOIUrl":null,"url":null,"abstract":"In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"87","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 87

Abstract

In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.

查看原文本刊更多论文

端到端可训练任务导向神经对话模型中的迭代策略学习

在本文中，我们提出了一个深度强化学习(RL)框架，用于端到端面向任务的对话系统中的迭代对话策略优化。用RL学习对话策略的流行方法包括让对话代理根据用户模拟器进行学习。然而，构建一个可靠的用户模拟器并非易事，通常与构建一个好的对话代理一样困难。我们通过模拟两个代理之间的对话，用深度强化学习共同优化对话代理和用户模拟器来解决这一挑战。我们首先通过监督训练直接从对话语料库中学习来引导一个基本的对话代理和一个基本的用户模拟器。然后，我们通过让两个智能体进行面向任务的对话并使用深度强化学习迭代优化它们的策略来进一步改进它们。对话代理和用户模拟器都设计了可以端到端训练的神经网络模型。我们的实验结果表明，与监督训练和单智能体RL训练基线模型相比，所提出的方法在任务成功率和总任务奖励方面有很大的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量