Voting-Based Backchannel Timing Prediction Using Audio-Visual Information

Proceedings of the Fourth International Conference on Human Agent Interaction Pub Date : 2016-10-04 DOI:10.1145/2974804.2980501

T. Nishide, Kei Shimonishi, H. Kawashima, T. Matsuyama

引用次数: 0

Abstract

While many spoken dialog systems are recently developed, users need to summarize and convey what they want the system to do clearly. However, in a human dialog, a speaker often summarize what to say incrementally, provided that there is a good listener who responds to the speaker's utterances at appropriate timing. We consider that generating backchannel responses, where appropriate, overlapped with the user's utterances is crucial for an artificial listener system that can promote user's utterances since such overlaps are the norm in human dialogs. Toward the goal to realize such a listener system, in this paper, we propose a voting-based algorithm of predicting the end of utterances early (i.e., before the utterances end) using audio-visual information. In the evaluation, we demonstrate the effectiveness of using audio-visual information and the applicability of the voting-based prediction algorithm with some early results.

查看原文本刊更多论文

利用视听信息进行基于投票的反信道时间预测

虽然最近开发了许多口语对话系统，但用户需要清楚地总结和传达他们希望系统做什么。然而，在人类对话中，只要有一个好的听众在适当的时间对说话者的话语做出回应，说话者通常会逐步总结自己要说的话。我们认为，在适当的情况下，生成与用户话语重叠的反向通道响应对于能够促进用户话语的人工听者系统至关重要，因为这种重叠在人类对话中是常态。为了实现这样一个听者系统，在本文中，我们提出了一种基于投票的算法，利用视听信息提前预测话语结束(即在话语结束之前)。在评估中，我们展示了使用视听信息的有效性和基于投票的预测算法的适用性，并取得了一些初步结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Fourth International Conference on Human Agent Interaction

自引率

0.00%

发文量