Connecting concepts from vision and speech processing

S. Wachsmuth, G. Sagerer
{"title":"Connecting concepts from vision and speech processing","authors":"S. Wachsmuth, G. Sagerer","doi":"10.1109/ISIU.1999.824829","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach.","PeriodicalId":227256,"journal":{"name":"Proceedings Integration of Speech and Image Understanding","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Integration of Speech and Image Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIU.1999.824829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach.
连接视觉和语音处理的概念
本文讨论了如何在语音和视觉数据的解释之间建立参考联系的问题。为了消除错误、模糊或不完整的概念描述,我们提出了一个概率交互方案。利用贝叶斯网络实现依赖关系的建模和推理的计算。该交互方案为消歧和错误恢复提供了基础。我们在一个组装任务环境中实现了一个交互组件。机器人建造者可以通过语音和手势指示来连接木制玩具建造套件的原始部件。该系统在一个测试数据集上进行评估,该数据集由16位说话者的448个语音组成,这些说话者在来自不同场景的10张图像上命名物体。第一个结果表明了概率方法的有效性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信