Focusing computational visual attention in multi-modal human-robot interaction

ICMI-MLMI '10 Pub Date : 2010-11-08 DOI:10.1145/1891903.1891912

Boris Schauerte, G. Fink

引用次数: 51

Abstract

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.

查看原文本刊更多论文

多模态人机交互中计算视觉注意力的聚焦

识别口头和非口头提及的对象是人机交互的一个重要方面。最重要的是，必须实现共同关注的焦点，从而实现自然的交互行为。在这篇文章中，我们介绍了一个基于显著性的模型，该模型反映了多模态引用行为如何影响视觉搜索，即在场景中找到特定对象的任务。因此，我们将从指向手势中获得的位置信息与从语言中获得的关于被指物体视觉外观的上下文知识结合起来。然后，可用的信息被整合到一个生物驱动的显著性模型中，形成了视觉搜索的基础。我们通过实验评估的结果证明了所提出方法的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICMI-MLMI '10

自引率

0.00%

发文量