Voting-Based Backchannel Timing Prediction Using Audio-Visual Information

T. Nishide, Kei Shimonishi, H. Kawashima, T. Matsuyama
{"title":"Voting-Based Backchannel Timing Prediction Using Audio-Visual Information","authors":"T. Nishide, Kei Shimonishi, H. Kawashima, T. Matsuyama","doi":"10.1145/2974804.2980501","DOIUrl":null,"url":null,"abstract":"While many spoken dialog systems are recently developed, users need to summarize and convey what they want the system to do clearly. However, in a human dialog, a speaker often summarize what to say incrementally, provided that there is a good listener who responds to the speaker's utterances at appropriate timing. We consider that generating backchannel responses, where appropriate, overlapped with the user's utterances is crucial for an artificial listener system that can promote user's utterances since such overlaps are the norm in human dialogs. Toward the goal to realize such a listener system, in this paper, we propose a voting-based algorithm of predicting the end of utterances early (i.e., before the utterances end) using audio-visual information. In the evaluation, we demonstrate the effectiveness of using audio-visual information and the applicability of the voting-based prediction algorithm with some early results.","PeriodicalId":185756,"journal":{"name":"Proceedings of the Fourth International Conference on Human Agent Interaction","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth International Conference on Human Agent Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2974804.2980501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While many spoken dialog systems are recently developed, users need to summarize and convey what they want the system to do clearly. However, in a human dialog, a speaker often summarize what to say incrementally, provided that there is a good listener who responds to the speaker's utterances at appropriate timing. We consider that generating backchannel responses, where appropriate, overlapped with the user's utterances is crucial for an artificial listener system that can promote user's utterances since such overlaps are the norm in human dialogs. Toward the goal to realize such a listener system, in this paper, we propose a voting-based algorithm of predicting the end of utterances early (i.e., before the utterances end) using audio-visual information. In the evaluation, we demonstrate the effectiveness of using audio-visual information and the applicability of the voting-based prediction algorithm with some early results.
利用视听信息进行基于投票的反信道时间预测
虽然最近开发了许多口语对话系统,但用户需要清楚地总结和传达他们希望系统做什么。然而,在人类对话中,只要有一个好的听众在适当的时间对说话者的话语做出回应,说话者通常会逐步总结自己要说的话。我们认为,在适当的情况下,生成与用户话语重叠的反向通道响应对于能够促进用户话语的人工听者系统至关重要,因为这种重叠在人类对话中是常态。为了实现这样一个听者系统,在本文中,我们提出了一种基于投票的算法,利用视听信息提前预测话语结束(即在话语结束之前)。在评估中,我们展示了使用视听信息的有效性和基于投票的预测算法的适用性,并取得了一些初步结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信