多模态人机交互中计算视觉注意力的聚焦

ICMI-MLMI '10 Pub Date : 2010-11-08 DOI:10.1145/1891903.1891912

Boris Schauerte, G. Fink

{"title":"多模态人机交互中计算视觉注意力的聚焦","authors":"Boris Schauerte, G. Fink","doi":"10.1145/1891903.1891912","DOIUrl":null,"url":null,"abstract":"Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"Focusing computational visual attention in multi-modal human-robot interaction\",\"authors\":\"Boris Schauerte, G. Fink\",\"doi\":\"10.1145/1891903.1891912\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.\",\"PeriodicalId\":181145,\"journal\":{\"name\":\"ICMI-MLMI '10\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICMI-MLMI '10\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1891903.1891912\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICMI-MLMI '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1891903.1891912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

摘要

识别口头和非口头提及的对象是人机交互的一个重要方面。最重要的是，必须实现共同关注的焦点，从而实现自然的交互行为。在这篇文章中，我们介绍了一个基于显著性的模型，该模型反映了多模态引用行为如何影响视觉搜索，即在场景中找到特定对象的任务。因此，我们将从指向手势中获得的位置信息与从语言中获得的关于被指物体视觉外观的上下文知识结合起来。然后，可用的信息被整合到一个生物驱动的显著性模型中，形成了视觉搜索的基础。我们通过实验评估的结果证明了所提出方法的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Focusing computational visual attention in multi-modal human-robot interaction

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICMI-MLMI '10

自引率

0.00%

发文量