Reinforcement learning for spoken dialogue systems using off-policy natural gradient method

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424161

Filip Jurcícek

引用次数: 2

Abstract

Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.

查看原文本刊更多论文

基于非策略自然梯度方法的口语对话系统强化学习

强化学习方法已成功用于统计对话系统中的对话策略优化。通常，强化技术在策略上学习，即，当系统与用户交互时在线更新对话策略。这种方法的另一种替代方法是off-policy强化学习，它从先前收集的固定对话语料库中离线估计最佳对话策略。提出了一种基于自然策略梯度和重要抽样的非策略强化学习方法。在旅游信息领域的口语对话系统上对该算法进行了评价。实验表明，该方法学习了一种对话策略，显著优于基线手工制作的对话策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量